United States Government Accountability Office 



Report to the Subcommittee on 
Terrorism, Technology and Homeland 
Security, Committee on the Judiciary, 
U.S. Senate 



ESTIMATING THE 
UNDOCUMENTED 
POPULATION 



A "Grouped Answers" 
Approach to 
Surveying Foreign- 
Born Respondents 



m G A O 

^_^^™^^^_^Accountability * Integrity * Reliability 

GAO-06-775 



September 2006 



G A O 



Accountability Integrity Reliability 

Highlights 

Highlights of GAO-06-775, a report to the 
Subcommittee on Terrorism, Technology 
and Homeland Security, Committee on the 
Judiciary, U.S. Senate 



ESTIMATING THE UNDOCUMENTED 
POPULATION 

A "Grouped Answers" Approach to 
Surveying Foreign-Born Respondents 



Why GAO Did This Study 

As greater numbers of foreign-born 
persons enter, live, and work in the 
United States, policymakers need 
more information — particularly on 
the undocumented population, its 
size, characteristics, costs, and 
contributions. This report reviews 
the ongoing development of a 
potential method for obtaining 
such information: the "grouped 
answers" approach. In 1998, GAO 
devised the approach and 
recommended further study. In 
response, the Census Bureau tested 
respondent acceptance and 
recently reported results. 

GAO answers four questions. 

(1) Is the grouped answers 
approach acceptable for use in a 
national survey of the foreign-born? 

(2) What further research may be 
needed? (3) How large a survey is 
needed? (4) Are any ongoing 
surveys appropriate for inserting a 
grouped answers question series 
(to avoid the cost of a new survey)? 

For this study, GAO consulted an 
independent statistician and other 
experts, performed test 
calculations, obtained documents, 
and interviewed officials and staff 
at federal agencies. 

The Census Bureau and DHS 
agreed with the main findings of 
this report. DHHS agreed that the 
National Survey of Drug Use and 
Health is not an appropriate survey 
for inserting a grouped answers 
question series. 



What GAO Recommends 



GAO makes no new 
recommendations in this report. 

www.gao.gov/cgi-bin/getrpt7GAO-06-775. 

To view the full product, including the scope 
and methodology, click on the link above. 
For more information, contact Nancy R. 
Kingsbury at (202) 512-2700 or 
kingsburyn@gao.gov. 



What GAO Found 

The grouped answers approach is designed to ask foreign-born respondents 
about their immigration status in a personal-interview survey. Immigration 
statuses are grouped in Boxes A, B, and C on two different flash cards — with the 
undocumented status in Box B. Respondents are asked to pick the box that 
includes their current status and are told, "If it's in Box B, we don't want to 
know which specific category applies to you." 

A random half of respondents are shown the card on the left of the figure (Card 
1), resulting in estimates of the percentage of the foreign-born population who 
are in each box of that card. The other half of the respondents are shown the 
card on the right, resulting in corresponding estimates for slightly different 
boxes. (No one sees both cards.) The percentage undocumented is estimated 
by subtraction: The percentage of the foreign-born who are in Box B of one card 
minus the percentage who are in Box A of the other card. 

Immigration Status Cards 1 and 2 

Card 1 



Card 2 



Box 
A 



iii 



Box 

B 



Legal permanent 
resident 

with a valid and official 
by the U.S. government 

Currently 
"undocumented" 

Right now, I do not have 
a currently valid, legal U.S. 
immigration status 



Box 

c 



TPS*, parolee, or some 
other category 

Not in Box A or Box B 



Box 

A 



Legal permanent resident 

with a valid and official 
green card issued to me 
by the U.S. government 



Refugee or asylee 

(approved, not 
applicant) 



Box 

B 



United States 
citizen 



Student, work, 
business or tourist visa 



111 



Currently 
"undocumented" 

Right now, I do not have 
a currently valid, legal U.S. 
immigration status 



Box 

c 



TPS*, parolee, or some 
other category 

Not in Box A or Box B 



Sources: GAO; Corel Draw (flag and suitcase); DHS (resident alien cards). (The actual size of each card is 8-1/2" by 1 1 .") 

The grouped answers approach is acceptable to many experts and immigrant 
advocates — with certain conditions, such as (for some advocates) private sector 
data collection. 

Most respondents tested did not object to picking a box. Research is needed to 
assess issues such as whether respondents pick the correct box. A sizable 
survey — roughly 6,000 or more respondents — would be needed for 95 percent 
confidence and a margin of error of (plus or minus) 3 percentage points. The 
ongoing surveys that GAO identified are not appropriate for collecting data on 
immigration status. (For example, one survey takes names and Social Security 
numbers, which might affect acceptance of immigration status questions.) 
Whether further research or implementation in a new survey would be justified 
depends on how policymakers weigh the need for such information against 
potential costs and the uncertainties of future research. 

United States Government Accountability Office 



Contents 



Letter 1 

Results in Brief 5 

Background 7 
Experts Seem to Accept "Grouped Answers" Questions If Fielded 

by a Private Sector Organization 27 

Various Tests Are or May Be Needed 34 
Some 6,000 Foreign-Born Respondents Are Needed for 

"Reasonably Precise" Estimates of the Undocumented 40 

The Most Efficient Field Strategy Does Not Seem Feasible 45 

Observations 53 

Agency Comments 56 



Appendix I Scope and Methodology 60 



Appendix II Estimating Characteristics, Costs, and 

Contributions of the Undocumented Population 64 



Appendix III A Review of Census Bureau and GAO Reports on 

the Field Test of the Grouped Answer Method 68 



Appendix IV A Brief Examination of Responses Observed while 

Testing an Indirect Method for Obtaining Sensitive 
Information 73 



Appendix V The Issue of Informed Consent 82 



Appendix VI A Note on Variances and "Mirror Image" Estimates 84 



Appendix VII 



Comments from the Department of Commerce 86 

Page i GAO-06-775 Estimating the Undocumented Population 



Appendix VIII 



Comments from the Department of Homeland 
Security 



89 



Appendix IX 


Comments from the Department of Health and Human 
Services 


90 


Appendix X 


GAO Contact and Staff Acknowledgments 


92 


Bibliography 




93 


Tables 








Table 1: Approximate Number of Foreign-Born Respondents 

Needed to Estimate Percentage Undocumented within 
2, 3, or 4 Percentage Points at 90 Percent Confidence 
Level, Using Two-Card Grouped Answers Data 

Table 2: Approximate Number of Foreign-Born Respondents 

Needed to Estimate Percentage Undocumented, within 
2, 3, or 4 Percentage Points, at 95 Percent Confidence 
Level, Using Two-Card Grouped Answers Data 

Table 3: Survey Appropriateness: Whether Surveys Meet Criteria 
Based on the Grouped Answers Design 

Table 4: Survey Appropriateness: Whether Surveys Meet Table 3 
(Design Based) Criteria and Additional Criteria Based on 
Immigrant Advocates' Views 

Table 5: Experts GAO Consulted on Immigration Issues or 
Immigration Studies 


43 

43 
50 

52 
60 



Page ii 



GAO-06-775 Estimating the Undocumented Population 



Figures 



Figure 1: Immigration Status Card 1, Grouped Answers 8 

Figure 2: Immigration Status Card 2 11 

Figure 3: Cards 1 and 2 Compared 13 

Figure 4: SIPP Flash Card 21 

Figure 5: Training Card 1 23 

Figure 6: Training Card 2 24 

Figure 7: Immigration Status Card Tested in GSS 25 



Abbreviations 



ACS American Community Survey 

BLS Bureau of Labor Statistics 

CASI Computer Assisted Self Interview 

CPS Current Population Survey 

DHS Department of Homeland Security 

GSS General Social Survey 

HHS Department of Health and Human Services 

INS Immigration and Naturalization Service 

NAWS National Agricultural Workers Survey 

NCHS National Center for Health Statistics 

NHIS National Health Interview Survey 

NORC National Opinion Research Center 

NRC National Research Council 

NSDUH National Survey on Drug Use and Health 

NSF National Science Foundation 

OMB Office of Management and Budget 

SAMHSA Substance Abuse and Mental Health Services Administration 

SIPP Survey of Income and Program Participation 



This is a work of the U.S. government and is not subject to copyright protection in the 
United States. It may be reproduced and distributed in its entirety without further 
permission from GAO. However, because this work may contain copyrighted images or 
other material, permission from the copyright holder may be necessary if you wish to 
reproduce this material separately. 



Page ili 



GAO-06-775 Estimating the Undocumented Population 



i 

^ G A O 

— Accountability * Integrity * Reliability 

United States Government Accountability Office 
Washington, DC 20548 



September 29, 2006 

The Honorable Jon Kyi 
Chairman 

The Honorable Dianne Feinstein 
Ranking Minority Member 
Subcommittee on Terrorism, Technology 

and Homeland Security 
Committee on the Judiciary 
United States Senate 

As greater numbers of foreign-born persons enter, live, and work in the 
United States, policymakers and the general public increasingly place high 
priority on issues involving immigrants. Because separate policies, laws, 
and programs apply to different immigration statuses, valid and reliable 
information is needed for populations defined by immigration status. 
However, government statistics generally do not include such information. 

The information most difficult to obtain concerns the size, characteristics, 
costs, and contributions of the population referred to in this report as 
undocumented or currently undocumented. 1 Such information is needed 
because, for example, large numbers of undocumented persons arrive 
each year, and the Census Bureau has realized that information on the size 
of the undocumented population would help estimate the size of the total 
U.S. population, especially for years between decennial censuses. 2 More 



x Our previous reports and those of other government agencies have sometimes used the 
terms undocumented, illegal aliens, illegal immigrants, unauthorized immigrants, and 
not legally present. We use undocumented here, because this report concerns a technique 
for surveying the foreign-born and an ongoing federally funded survey uses this term as a 
response category when asking about legal status. We define undocumented as foreign- 
born persons who are illegally present in the United States. Foreign-born persons (that is, 
persons not born as U.S. citizens) were born outside the United States to parents who were 
both not U.S. citizens at the time of the birth. 

2 Most recently, the Census Bureau has stated that among its "enhancement priorities" to 
"improve estimates of net international migration" are efforts to research ways of 
estimating "international migrants by migrant status (legal migrants, temporary migrants, 
quasi-legal migrants, unauthorized migrants, and emigrants)" with the overall purpose of 
producing annual estimates of the U.S. population. ("The U.S. Census Bureau's Intercensal 
Population Estimates and Projections Program: Basic Underlying Principles," paper 
distributed by the Census Bureau at its conference on Population Estimates: Meeting User 
Needs, Alexandria, Virginia, July 19, 2006.) 



Page 1 



GAO-06-775 Estimating the Undocumented Population 



generally, information about the undocumented population — and about 
changes in that population — can contribute to policy-related planning and 
evaluation efforts. 

As you know, in 1998, we devised an approach to surveying foreign-born 
respondents about their immigration status. 3 This self-report, personal- 
interview approach groups answers so that no respondent is ever asked 
whether he, she, or anyone else is undocumented. In fact, no individual 
respondent is ever categorized as undocumented. Logically, however, 
grouped answers data can provide indirect estimates of the undocumented 
population. Generally, grouped answers questions on immigration status 
would be asked as part of a larger survey that includes direct questions on 
demographic characteristics and employment and might include questions 
on school attendance, use of medical facilities, and so forth; some surveys 
also ask specific questions that can help estimate taxes paid. Potentially, 
combining the answers to such questions with grouped answers data can 
provide further information on the characteristics, costs, and 
contributions of the undocumented population. 

We reported the first results of preliminary tests of the grouped answers 
approach, primarily with Hispanic farmworkers, in 1998 and 1999; the 
majority of the preliminary test interviews were fielded by Aguirre 
International of Burlingame, California. 4 We also recommended that the 
Immigration and Naturalization Service (INS) and the Census Bureau 
further develop and test the method. In response, the Census Bureau 
contracted for a test as part of the 2004 General Social Survey (GSS), 
which is fielded by the National Opinion Research Center (NORC) at the 
University of Chicago, with "core funding" provided by a grant from the 



:3 GAO, Immigration Statistics: Information Gaps, Quality Issues Limit Utility of Federal 
Data to Policymakers, GAO/GGD-98-164 (Washington, D.C.: July 31, 1998), and Survey 
Methodology: An Innovative Technique for Estimating Sensitive Survey Items, 
GAO/GGD-00-30 (Washington, D.C.: November 1999). 

4 See GAO/GGD-98-164 and GAO/GGD-00-30. 



Page 2 



GAO-06-775 Estimating the Undocumented Population 



National Science Foundation (NSF). 5 The Census Bureau's analysis of the 
2004 GSS data became available in 2006. 

In this report, we respond to your request that we review the ongoing 
development of the grouped answers approach and related issues. We 
address four questions: (1) Is the grouped answers approach "acceptable" 
for use in a national survey of the foreign-born population? 6 (2) What kinds 
of further research are or may be needed, based on the results of tests 
conducted thus far and expert opinion? (3) How large a survey is needed 
to provide "reasonably precise" estimates of the undocumented 
population, using grouped answers data? (4) Are there appropriate 
ongoing surveys in which the grouped answers question series might 
eventually be inserted (thus avoiding the costs of fielding a new survey)? 

To answer these questions, we 

• consulted private sector experts in immigration issues and studies, 
including immigrant advocates, immigration researchers, and 
others; 7 



5 The GSS is a long-standing series of nationally representative personal-interview self- 
report surveys, each consisting of a "core" question series and additional "modules." The 
funding for fielding the core question series is provided by a grant from NSF. The modules 
are question series added through grants from and contracts with a variety of sources. The 
Census Bureau contracted for a grouped answers module in the 2004 GSS. The bulk of the 
funding for that Census-GSS contract had been provided to the Census Bureau by the 
Department of Homeland Security (DHS). This test of the grouped answers approach was 
in response to our earlier recommendation in GAO/GGD-98-164. 

6 The acceptability of the grouped answers approach for use in a national survey is defined 
here primarily in terms of (1) the responses of immigrant advocates when the grouped 
answers approach is explained to them (that is, objecting versus not objecting to or 
accepting the method) and (2) respondents' tendency to pick a box when the grouped 
answers immigration status question is posed to them (rather than their refusing or saying 
that they "don't know"). The opinions of other experts — for example, those who have 
conducted studies of immigrants — are also relevant, as are interviewer judgments about 
respondent reactions. 

7 In all, we consulted over 20 private sector immigration experts (listed in appendix I, table 
5). Because of the importance of immigrant advocates' views on the issues in surveying 
immigrants, table 5 identifies the experts representing immigrant advocate organizations. 
For purposes of this report, we define immigrant advocate organizations as those whose 
purpose includes representing the immigrants' point of view. More generally, in reporting 
the views of the experts we consulted, we recognize that in some cases other 
knowledgeable persons might have differing views. 



Page 3 



GAO-06-775 Estimating the Undocumented Population 



• consulted an independent statistical expert, Dr. Alan Zaslavsky, and 
other experts in statistics and surveys; 8 

• reanalyzed the data from the 2004 GSS test and subjected both our 
analysis and the Census Bureau's analysis to review by the 
independent statistical expert; 

• performed test calculations, using specific assumptions; and 

• identified ongoing surveys that might be candidates for 
piggybacking the grouped answers question series, gathered 
documents on those surveys, and met with officials and staff at the 
federal agencies that conduct or sponsor them. 9 

We also met with other relevant federal agencies. 10 Appendix I describes 
our methodology and the scope of our work in more detail. We conducted 
our work in accordance with generally accepted government auditing 
standards between July 2005 and September 2006. 



Alan Zaslavsky is Professor of Statistics, Department of Health Care Policy, Harvard 
Medical School, Boston, Massachusetts. We selected Dr. Zaslavsky because he (1) is 
independent with respect to the method we discuss; (2) is a noted statistician who has 
received many awards, has advised multiple executive agencies on the design and analysis 
of large-scale surveys, and serves on the National Research Council's (NRC) Committee for 
National Statistics at the National Academy of Sciences; and (3) has developed innovative 
statistical approaches. We also sought the advice of two other noted statisticians who had 
advised us in earlier work on this method (Dr. Fritz Scheuren and Dr. Mary Grace Kovar of 
NORC at the University of Chicago) and GAO colleagues with expertise in statistics. 

9 We talked with four agencies sponsoring or conducting these surveys: the Census Bureau 
in the Department of Commerce, the Bureau of Labor Statistics (BLS) in the Department of 
Labor, and the National Center for Health Statistics (NCHS) and the Substance Abuse and 
Mental Health Services Administration (SAMHSA) in the Department of Health and Human 
Services (HHS). Survey-related staff at these agencies provided information on the specific 
surveys. Additionally, we deemed some staff at these agencies to be experts in statistics 
and survey research. 

10 These included the Statistical and Science Policy Branch of the Office of Information and 
Regulatory Affairs in the Office of Management and Budget (OMB), the Employment and 
Training Administration in the Department of Labor (DOL), and the Office of Immigration 
Statistics within the Policy Directorate and the Research and Evaluation Division, Office of 
Policy and Strategy, U.S. Citizenship and Immigration Services in the Department of 
Homeland Security (DHS). 



Page 4 



GAO-06-775 Estimating the Undocumented Population 



RGSllltS in Brief Acceptance of the grouped answers approach appears to be high among 

lj immigrant advocates and respondents. The advocates we interviewed 

generally accepted the approach — with provisos such as fielding by a 
university or other private sector organization, appropriate data protection 
(including protections against government misuse), and high-quality 
survey procedures. The independent statistician, reviewing the Census 
Bureau's analysis and our reanalysis of the 2004 GSS test of respondent 
acceptance, concluded that the grouped answers approach is "generally 
usable" for surveys interviewing foreign-born respondents in their homes. 11 

Based on the results of the GSS test and on consultations and interviews 
with varied experts, further work is or may be needed to 



• Expand knowledge about respondent acceptance. For example, 
the 2004 GSS test did not cover persons who are "linguistically 
isolated" in the sense that no member of their household age 14 or 
older speaks English "very well". 12 



• Test the accuracy of responses or respondents' intent to answer 
accurately. 13 To date, no tests of response accuracy, or the intent to 
answer accurately, have been conducted, although a number of 
relevant designs can be identified. 

Thousands of foreign-born respondents would be needed to obtain 
"reasonably precise" grouped answers estimates of the undocumented 



Our reanalysis differed from the Census Bureau's in that we eliminated 19 GSS cases that 
we deemed ineligible because, for example, interviewing took place over the telephone 
rather than in person, as required by the grouped answers approach; we found that 6 
respondents of more than 200 failed to provide usable, specific answers. 

12 The GSS allowed bilingual household members to help respondents with limited English 
skills. Our earlier testing with farmworkers was conducted in Spanish, but no testing has 
covered linguistically isolated non-Hispanic respondents. About 4 percent of the foreign- 
born population both (1) does not speak Spanish and (2) is linguistically isolated (that is, 
is part of a household in which no member age 14 or older speaks English "very well"). 
Although this may seem a small percentage, it is possible that non-Hispanic undocumented 
persons are concentrated in this group. 

13 The distinction between accurate responses and the intent to answer accurately is 
necessary because some respondents may mistakenly think that they are, for example, in 
a legal status. 



Page 5 



GAO-06-775 Estimating the Undocumented Population 



population. 14 Our calculations and work with statisticians showed that 
while many factors are involved and it is not possible to guarantee a 
specific level of precision, roughly 6,000 interviews would be likely to be 
sufficient to support estimates of the size of the undocumented population 
and major subgroups within it (especially high-risk subgroups, defined by 
characteristics such as age 18 to 40, recently arrived, employed 15 ). 
Quantitative estimates are also possible; for example, major program costs 
associated with the undocumented population may also be estimated, 
given appropriate program data. 

None of the ongoing, large-scale national surveys we identified appear to 
be appropriate for piggybacking the grouped answers question series. One 
self-report personal interview survey is fielded by a private sector 
organization (under a contract with a Department of Health and Human 
Services (HHS) agency); however, that survey focuses on the use of illegal 
drugs, and we believe that direct questions on drug use might heighten the 
sensitivity of the questions on immigration status. We believe other 
ongoing surveys to be inappropriate; for example, one asks other sensitive 
questions (on HIV status) and takes respondents' names and Social 
Security numbers. Additionally, the Census Bureau fields these surveys. 

Whether further research or a new survey would be justified depends on 
issues such as how policymakers weigh the need for such information 
against potential costs. 

We received comments on a draft of this report from the Department of 
Commerce (Census Bureau), the Department of Homeland Security 
(DHS), and the Department of Health and Human Services (DHHS). The 
Census Bureau and DHS generally agreed with the main findings of the 
report, and DHHS agreed that the National Survey of Drug Use and Health 
would not be appropriate for "piggy-backing" the grouped answers 
question series. These agencies also provided other technical comments 
(see appendices VII, VIII, and IX). 



We define "reasonably precise" as a 90 percent or 95 percent confidence interval spanning 
plus or minus 2 to 4 percentage points. A 90 percent or 95 percent confidence interval is the 
interval within which the parameter in question would be expected to fall 90 percent or 95 
percent of the time, if the sampling and interval estimation procedures were repeated in an 
infinite number of trials. 

15 In many cases, the method would not be suitable for low-risk subgroups. {High-risk and 
low-risk refer to subgroups with above-average and below-average percentages of 
undocumented persons, respectively.) 



Page 6 



GAO-06-775 Estimating the Undocumented Population 



Background 



Grouped Answers Reduce 
"Question Threat" and 
Allow Indirect Estimates 
of the Undocumented 



Survey questions about sensitive topics carry a "threat" for some 
respondents, because they fear that a truthful answer could result in some 
degree of negative consequence (at a minimum, social disapproval). The 
grouped answers approach is designed to reduce this threat when asking 
about immigration status. 

Three key points about the grouped answers approach are that 

1. no respondent is ever asked whether he or she, or anyone else, is 
undocumented; 

2. two pieces of information are separately provided by two subsamples 
of respondents (completely different people — no one is shown both 
immigration status cards); and 

3. taking the two pieces of information together — like two different 
pieces of a puzzle — allows indirect estimation of the undocumented 
population, but no individual respondent (and no piece of data on an 
individual respondent) is ever categorized as undocumented. 

We discuss each point in some detail. 16 



1. No respondent is ever asked whether he or she is in the 
undocumented category. Unlike questions that ask respondents to 
choose among specific answer categories, the grouped answers approach 
combines answer categories in sets or "boxes," as shown in figure 1. 



16 The grouped answers approach derives from (1) the residual method described by Henry 
S. Schryock and Jacob S. Siegel and Associates, The Methods and Materials of 
Demography (Washington, D.C.: U.S. Government Printing Office, 1980), and Robert 
Warren and Jeffrey S. Passel, "A Count of the Uncountable: Estimates of Undocumented 
Aliens Counted in the 1980 Census," Demography, 24:3 (1987): 375-93, and (2) earlier 
indirect survey-based techniques, such as "randomized response" (see Stanley Warner, "A 
Survey Technique for Eliminating Evasive Answer Bias," Journal of the American 
Statistical Association, 60 (1965): 63-69, and Bernard Greenberg and others, "The 
Unrelated Questions Randomized Response Model: Theoretical Framework," Journal of 
the American Statistical Association, 64 (1969): 520-39. 



Page 7 



GAO-06-775 Estimating the Undocumented Population 



Figure 1: Immigration Status Card 1, Grouped Answers 



• 




United States 




citizen 












Student, work, 




business or tourist visa 




I am not in violation of 




admission period limits 


111 


or work restrictions 







Legal permanent 
resident 

with a valid and official 
green card issued to me 
by the U.S. government 

Currently 
' ' u ndocu merited ' * 

Right now, I do not have 
a currently valid, legal U. S 
immigration status 



Refugee or asylee 

(approved, not 
applicant) 




c 



TPS*, parolee, or some 




other category 




Not in Box A or Box B 


a 




"Temporary Protected Status 





Sources: GAO; Corel Draw (flag and suitcase); DHS (resident alien cards). (The actual size of the card is 8-1/2" by 1 1 .") 



Page 8 



GAO-06-775 Estimating the Undocumented Population 



Box B includes the sensitive answer category — currently 
"undocumented" — along with other categories that are nonsensitive. 17 

Each respondent is asked to "pick the Box" — Box A, Box B, or Box C — 
that contains the specific answer category that applies to him or her. 
Respondents are told, in effect: If the specific category that applies to you 
is in Box B, we don't want to know which one it is, because right now we 
are focusing on Box A categories. 18 

By using the boxes, the interview avoids "zeroing in" on the sensitive 
answer. The specific categories shown in the boxes in figure 1 are grouped 
so that 

• one would expect many respondents who are here legally, as well 
as those who are undocumented, to choose Box B, 19 and 

• there is virtually no possibility of anyone deducing which specific 
category within Box B applies to any individual respondent. 

2. Two pieces of information are provided separately by two 
subsamples of respondents (no one is shown both immigration 
status cards). Respondents are divided into two subsamples, based on 
randomization procedures or rotation (alternation) procedures conducted 
outside the interview process. (For example, a rotation procedure might 
specify that within an interviewing area, every other household will be 
designated as subsample 1 or subsample 2.) 



Note that Box B in figure 1 uses the term currently "undocumented" — with quotation 
marks around undocumented. We believe this wording may help communicate with 
undocumented respondents who either (1) had a legal status in the past (for example, 
entered with a temporary visa but have now overstayed and thus lost their legal status) or 
(2) are likely to acquire a legal status in the near future (for example, entered illegally and 
applied for legal status but have not yet received it). Potentially, the quotation marks might 
help communicate with respondents who have some kind of document (for example, a 
"matricula card" issued by the Mexican government) but who do not have a valid legal 
immigration status that allows U.S. residence. 

18 In the test with Hispanic farmworkers, interviewers explained: "Because we're using the 
boxes — we WON'T 'zero in' on anything somebody might not want to tell us." 

19 In future, changes in percentages of foreign-born in various statuses might warrant 
changes in groupings across the boxes. Additionally, the specific legal statuses defined by 
law might change, requiring a change in the legal statuses shown on the cards. 



Page 9 



GAO-06-775 Estimating the Undocumented Population 



This "split sample" procedure has been used routinely for many surveys 
over the years. As applied to the grouped answers approach, the two 
subsamples are shown alternative flash cards. Immigration Status Card 1, 
described above, represents one way to group immigration statuses in 
three boxes. A second immigration status flash card (Immigration Status 
Card 2, shown in figure 2) groups the same statuses differently. 



Page 10 



GAO-06-775 Estimating the Undocumented Population 



Figure 2: Immigration Status Card 2 



Legal permanent resident 

with a valid and official 
green card issued to me 
by the U.S. government 



Refugee or asylee 

(approved, not 
applicant) 




B 



United States 






citizen 






Student, work, 






business or tourist visa 






1 am not in violation of 




■I 


admission period limits 






or work restrictions 






Currently 






"undocumented" 






Right now, 1 do not have 






a currently valid, legal U. S. 






immigration status 







TPS*, parolee, or some 
other category 

Not in Box A or Box B 

"Temporary Protected Status 



(J 



Sources: GAO; Corel Draw (flag and suitcase); DHS (resident alien cards). (The actual size of the card is 8-1/2" by 1 1 .") 



The alternative immigration-status cards can be thought of as "mirror 
images" in that 

• the two nonsensitive legal statuses in Box A of Card 1 appear in Box 
B of Card 2 and 

• the two nonsensitive legal statuses in Box B of Card 1 appear in Box 
A of Card 2. 

However, the undocumented status always appears in Box B. 



Page n 



GAO-06-775 Estimating the Undocumented Population 



Interviewers ask survey respondents in subsample 1 about immigration 
status with respect to Card 1. They ask survey respondents in subsample 
2 (completely different persons) about immigration status with respect to 
Card 2. Each respondent is shown one and only one immigration-status 
flash card. There are no highly unusual or complicated interviewing 
procedures. 20 

Because the two subsamples of respondents are drawn randomly or by 
rotation, each subsample represents the foreign-born population and, if 
sufficiently large, can provide "reasonably precise" estimates of the 
percentages of the foreign-born population in the boxes on one of the 
alternative cards. 

Incidentally, a respondent picking a box that does not include the sensitive 
answer — for example, a respondent picking Box A or Box C in figure 1 — 
can be asked follow-up questions that pinpoint the specific answer 
category that applies to him or her. Thus, direct information is obtained on 
all legal immigration statuses. The data on some of the legal categories can 
be compared to administrative data to check the reasonableness of 
responses. Additionally, these data provide estimates of legal statuses, 
which are useful when, for example, policymakers review legislation on 
the numbers of foreign-born persons who may be admitted to this country 
under specific legal status programs. 

3. No individual respondent is ever categorized as undocumented, 
but indirect estimates of the undocumented population can be 
made. Using two slightly different pieces of information provided by the 
two different subsamples allows indirect estimation of the size of the 
currently undocumented population — by simple subtraction. 

The only difference between Box B of Card 1 and Box A of Card 2 is the 
inclusion of the currently "undocumented" category in Box B of Card 1. 
Figure 3 shows both cards together for easy comparison. 



20 Unlike some other indirect estimation techniques, the grouped answers approach does 
not require unusual stratagems as part of the survey interview, such as asking respondents 
to make a secret random selection of a question. 



Page 12 



GAO-06-775 Estimating the Undocumented Population 



Figure 3: Cards 1 and 2 Compared 

Subsample 1, Card 1 



Subsample 2, Card 2 



United States 




citizen 








Student, work, 




business or tourist visa 




I am not in violation of 




admission period limits 


in 


or work restrictions 





Legal permanent 
resident 

with a valid and official 
green card issued to me 
by the U.S. government 

Currently 

' 1 u n doc u merited ' ' 

Right now, I do not have 
a currently valid, legal U.S. 
immigration status 




Refugee or asylee 

(approved, not 
applicant) 



«f B 



TPS*, parolee, or some 
other category 

Not in Box A or Box B 

"■Temporary Protected Status 



9 



Legal permanent resident 

with a valid and official 
green card issued to me 
by the U.S. government 



Refugee or asylee 

(approved, not 
applicant) 



United States 
citizen 



Student, work, 
business or tourist visa 

I am not in violation of 
admission period limits 
or work restrictions 

Currently 
"undocumented" 

Right now, I do not have 
a currently valid, legal U.S. 
immigration status 



^1 

111 



TPS*, parolee, or some 
other category 

Not in Box A or Box B 

Temporary Protected Status 



Sources: GAO; Corel Draw (flag and suitcase); DHS (resident alien cards). (The actual size of each card is 8-1/2" by 1 1 .") 



Thus, the percentage of the foreign-born population who are currently 
undocumented can be estimated as follows: 

• Start with the percentage of subsample 1 respondents who report 
that they are in Box B of Card 1 (hypothetical figure: 62 percent of 
subsample 1). 



Page 13 



GAO-06-775 Estimating the Undocumented Population 



• Subtract from this the percentage of subsample 2 who say they are 
in Box A on Card 2 (hypothetical figure: 33 percent of subsample 2). 

• Observe the difference (29 percent, based on the hypothetical 
figures); this represents an estimate of the percentage of the 
foreign-born population who are undocumented. 

Alternatively, a "mirror-image" estimate could be calculated, using Box B 
of Card 2 and Box A of Card l. 21 

To estimate the numerical size of the undocumented population, a 
grouped answers estimate of the percentage of the foreign-born who are 
undocumented would be combined with a census figure. For example, the 
2000 census counted 31 million foreign-born, and the Census Bureau 
issued an updated estimate of 35.7 million for 2005. The procedure would 
be to simply multiply the percent undocumented (based on the grouped 
answers data and the subtraction procedure) by a census count or an 
updated estimate for the year in question. 

These procedures ensure that no respondents — and no data on any 
specific respondent — are ever separated out or categorized as 
undocumented, not even during the analytic process of making indirect, 
group-level estimates. 

To further ensure reduction of "question threat," the grouped answers 
question series begins with flash cards that ask about nonsensitive topics 
and familiarize respondents with the 3-box approach. For each 
nonsensitive-topic card, interviewers ask the respondent which box 
applies to him or her, saying: If it's Box B, we do not want to know which 
specific category applies to you. 

In this way, most respondents should understand the grouped answers 
approach before seeing the immigration-status card. 



The result of the subtraction would be the same, either way — assuming that the same 
percentage of subsample 1 and subsample 2 chose Box C. 



Page 14 



GAO-06-775 Estimating the Undocumented Population 



To help ensure accurate responses, respondents who choose Box A can be 
asked a series of clarifying questions. 22 (No follow-up questions are 
addressed to anyone choosing Box B.) The questions for Box A 
respondents are designed to prompt them to, essentially, reclassify 
themselves in Box B, if that is appropriate. 23 

The grouped answers question series can potentially be applied in a large- 
scale general population survey, where the questions on immigration 
status would be added for the foreign-born respondents — provided that an 
appropriate survey can be identified. If a new survey of the general 
foreign-born population were planned, it would involve selecting a general 
sample of households and then screening out the households that do not 
include one or more foreign-born persons. 

Finally, we note that while the initial version of the grouped answers 
approach involved three alternative flash cards (and was termed the 
"three-card method"), we recently devised the version described here, 
which uses two cards rather than three. The two-card method is simpler, is 
easier to understand, and provides more precise estimates. All cards are 
alike in that they feature three boxes in which specific answer categories 
are grouped. 



Characteristics, Costs, and 
Contributions Can 
Potentially Be Estimated 



Generally, grouped answers questions on immigration status would be 
asked as part of a larger survey that includes direct questions on 
demographic characteristics and employment and might include questions 
on school attendance, use of medical facilities, and so forth; some surveys 
also ask specific questions that can help estimate taxes paid. Potentially, 
combining the answers to such questions with grouped answers data can 
be used to provide further information on the characteristics, costs, and 
contributions of the undocumented population. 



For example, in the test with Hispanic farmworkers, respondents who picked Box A and 
said they were legal permanent residents (they had a green card) were asked (1) under 
which program they had applied for a green card (Family Unity, employer, and so forth), 
(2) whether they had received the card (or had applied but not yet received it), (3) how 
they received it (in person or by mail), and (4) whether they had then applied for U.S. 
citizenship — and if so, whether they had received citizenship. 

23 If a respondent decides to reclassify himself or herself in Box B, on the basis of follow-up 
questions, survey procedures can record only the Box B classification — and delete the 
original Box A classification, as well as any answers to Box A follow-up questions. This 
prevents retention of any detailed immigration-status material on respondents in Box B. 



Page 15 



GAO-06-775 Estimating the Undocumented Population 



For example, the numbers of undocumented persons in major subgroups 
— such as demographic or employment status subgroups — can be 
estimated, provided that the sample of foreign-born persons interviewed is 
sufficiently large. 

Grouped answers data collected from adult respondents can also be used 
to estimate the number of children in various immigration statuses, 
including undocumented — provided that an additional question is asked. 24 
Additionally, when combined with separate quantitative data (for example, 
data on program costs per individual), grouped answers data can be used 
to estimate quantitative information (such as program costs) for the 
undocumented population as a whole — or, again, depending on sample 
size, for specific subgroups. 

The procedures for deriving these more complex indirect estimates are 
described in appendix II. No grouped answers respondent is ever 
categorized as undocumented. 



Statistical Information Is 
Needed on the 
Undocumented Population 



The foreign-born population of the United States is large and growing — 
as is the undocumented population within it. Congressional policymakers, 
the U.S. Commission on Immigration Reform, and the National Research 
Council's (NRC) Committee on National Statistics have indicated a need 
for statistical information on the undocumented population, including its 
size, characteristics, costs, and contributions. 



The Census Bureau estimates that as of 2005, foreign-born residents (both 
legally present and undocumented) numbered 35.7 million and accounted 
for at least one-tenth of all persons residing in each of 15 states and the 
District of Columbia. 25 These figures represent substantial increases over 
the prior 15 years. For example, in 1990 the foreign-born population 
totaled fewer than 20 million; only 3 states had a population more than 



The additional question would ask for the number of foreign-born children in the 
household who are in each box of the same immigration status card that the adult 
respondent used to report which box he or she is in. However, this questioning approach 
has not been tested. 

25 The 15 states and their percentages of foreign-born residents in 2005 were Arizona, 14.5; 
California, 27.2; Colorado, 10.1; Connecticut, 12.5; Florida, 18.5; Hawaii, 17.2; Illinois, 13.6; 
Maryland, 11.7; Massachusetts, 14.4; Nevada, 17.4; New Jersey, 19.5; New York, 21.4; 
Rhode Island, 12.6; Texas, 15.9; Washington, 12.2. The percentage in the District of 
Columbia was 13.1. 



Page 16 



GAO-06-775 Estimating the Undocumented Population 



one-tenth foreign-born. One result is that as the Department of Labor has 
testified, foreign-born workers now constitute almost 15 percent of the 
U.S. labor force, and the numbers of such workers are growing. 26 

A new paper from the Department of Homeland Security (DHS) puts the 
"unauthorized" immigrant population at 10.5 million as of January 2005 
and indicates that if recent trends continued, the figure for January 2006 
would be 11 million. 27 The Pew Hispanic Center's indirect estimate of the 
undocumented population as of 2006 is 11.5 million to 12 million. These 
estimates represent roughly one-third of the entire foreign-born 
population. 28 DHS has variously estimated the size of the undocumented 
population as of January 2000 as 7 million and 8.5 million. 29 Government 
and other estimates for 1990 numbered only 3.5 million. 30 

These various indirect estimates of the undocumented population are 
based on the "residual method." Residual estimation (1) starts with a 
census count or survey estimate of the number of foreign-born residents 
who have not become U.S. citizens and (2) subtracts out estimated 
numbers of legally present individuals in various categories, based on 
administrative data and assumptions (because censuses and surveys do 
not ask about legal status). The remainder, or residual, represents an 
indirect estimate of the size of the undocumented population. 

To illustrate the role of administrative data and assumptions, residual 
estimates draw on counts of the number of new green cards issued each 



Statement of Ronald Bird, Chief Economist, Office of the Assistant Secretary for Policy, 
U.S. Department of Labor, before the Committee on the Judiciary, U.S. Senate, July 5, 2006. 

27 Michael Hoefer, Nancy Rytina, and Christopher Campbell, Estimates of the Unauthorized 
Immigrant Population Residing in the United States: January 2005 (Washington, D.C.: 
Department of Homeland Security, Office of Immigration Statistics, August 2006). 

28 Jeffrey S. Passel, "The Size and Characteristics of the Unauthorized Migrant Population in 
the U.S.: Estimates Based on the March 2005 Current Population Survey," Research Report 
(Washington, D.C.: Pew Hispanic Center, Mar. 7, 2006). 

29 The first figure is from U.S. Immigration and Naturalization Service, Office of Policy and 
Planning, Estimates of the Unauthorized Immigrant Population Residing in the United 
States: 1990 to 2000 (Washington, D.C.: January 2003); the second is from Hoefer, Rytina, 
and Campbell. 

30 While different estimates are based on different definitions of undocumented, and there 
are questions about data reliability, it seems clear that the population of undocumented 
foreign-born persons is large and has increased rapidly. 



Page 17 



GAO-06-775 Estimating the Undocumented Population 



year. But they also require assumptions to account for emigration and 
deaths among those who received green cards in earlier years. 

A recent DHS paper providing residual estimates of the undocumented 
population includes ranges of estimates based on alternative assumptions 
made for two key components. 31 For example, "by lowering or raising the 
emigration rates 20 percent . . . the estimated unauthorized immigrant 
population would range from 10.0 million to 11.0 million." 32 The DHS paper 
also lists assumptions that were not subjected to alternative specifications. 
We believe the DHS paper represents an advance because, up to now, 
analysts producing residual estimates have generally not made public 
statements regarding the precision of the estimates. (Some critics have, 
however, indicated that residual estimates are likely to lack precision. 33 ) 

While the residual approach has been used to profile the undocumented 
population on two characteristics — age and country of birth — it is limited 
with respect to estimating (1) current geographic location and (2) current 
employment and benefit use. The reason is that current characteristics of 
legally present persons are not maintained in administrative records; 
analysts must therefore rely largely on assumptions. 34 In contrast, the 
grouped answers method does allow for the possibility of estimating 
current characteristics based on current self-reports. 

During the mid-1990s, the U.S. Commission on Immigration Reform 
determined that better statistical "information on legal status and type of 
immigrant [is] crucial" to assessing immigration policy. Indeed, the 



The alternative assumptions were made for levels of (1) American Community Survey 
(ACS) undercounting of "unauthorized" immigrants and (2) emigration from the United 
States on the part of legal immigrants counted as having been "admitted" between 1980 and 
2004. 

32 Hoefer, Rytina, and Campbell, p. 6. 

33 See Kenneth Hill, "Estimates of Legal and Unauthorized Foreign-Born Population for the 
United States and Selected States Based on Census 2000," presentation at the U.S. Census 
Bureau Conference, Immigration Statistics: Methodology and Data Quality, Alexandria, 
Virginia, February 13-14, 2006. A similar point was made by Jacob S. Siegel and David A. 
Swanson, The Methods and Materials of Demography, 2nd ed. (San Diego, Calif.: Elsevier 
Academic Press, 2004), p. 479. 

34 Administrative records on where legal immigrants live are based on their residence (or 
intended residence) at the time when legal permanent resident status was attained; these 
records have not been subsequently updated. There are no administrative records on 
current activities of legal permanent residents, such as employment. 



Page 18 



GAO-06-775 Estimating the Undocumented Population 



Commission called for a variety of improvements in estimates of the costs 
and benefits associated with undocumented immigration. 35 NRC's 
Committee on National Statistics further emphasized the need for better 
information on costs, especially state and local costs. 36 (If successfully 
fielded, the grouped answers method might help provide general 
information on such costs — and, potentially, specific information for large 
states such as California. Sample size limitations would be likely to 
prohibit separate analyses for specific local areas, small states, and states 
with low percentages of foreign-born or undocumented.) 

Over the years, we have received numerous congressional requests related 
to estimating costs associated with the undocumented population. 37 
Recent Census Bureau research and conferences reflect the realization 
that undocumented immigration is a key component of current population 
growth and that there is a resultant need for information on this group. 38 
Additionally, some of the immigrant advocates we interviewed expressed 
an interest in being able to better describe the contributions of the 
undocumented population. 



Surveys Are a Key Various national surveys ask foreign-born respondents to provide 

Information Source information about themselves and, in some cases, other persons in their 

households. While such surveys provide a wealth of information on a wide 
variety of areas, including some sensitive topics, national surveys 
generally do not ask about current immigration status — with the exception 
of a question on U.S. citizenship, which is included in several surveys. 



See U.S. Commission on Immigration Reform, U.S. Immigration Policy: Restoring 
Credibility: 1994 Report to Congress (Washington, D.C.: U.S. Government Printing Office, 
1994), pp. 179-86. 

36 NRC, Committee on National Statistics, Local Fiscal Effects of Illegal Immigration: 
Report of a Workshop (Washington, D.C.: National Academy Press, 1996), p. 1-2. 

37 See, for example, GAO, Illegal Alien Schoolchildren: Issues in Estimating State-by -State 
Costs, GAO-04-733 (Washington, D.C.: June 23, 2004), and Undocumented Aliens: 
Questions Persist about Their Impact on Hospitals' Uncompensated Care Costs, 
GAO-04-472 (Washington, D.C.: May 21, 2004). For a more general discussion, see 
GAO/GGD-98-164, ch. 2, "Policy-Related Information Needs." 

38 Census Bureau staff told us that this research includes J. Gregory Robinson, 
"Memorandum for Donna Kostanich," DSSD ACE. Revision II Memorandum Series No. 
PP-36, U.S. Bureau of the Census, Washington, D.C., December 31, 2002. 



Page 19 



GAO-06-775 Estimating the Undocumented Population 



As we reported earlier, it is believed that direct questions on immigration 
status "are very sensitive, and negative reactions to them could affect the 
accuracy of responses to other questions on [a] survey." 39 Two surveys that 
have asked respondents directly about immigration status for several 
years are 

• the National Agricultural Workers Survey (NAWS), an ongoing annual 
cross-sectional self-report survey of farmworkers, fielded by Aguirre 
International, a private sector firm under contract to the Department of 
Labor, since 1988, 40 and 

• the Survey of Income and Program Participation (SIPP), a longitudinal 
panel survey of the general population, conducted by the Census Bureau, 
which has asked immigration status questions since 1996. 

Of the two, SIPP is the more relevant, because its immigration status 
questions have been administered to a sample of the general foreign-born 
population. 

SIPP has asked an adult respondent-informant from each household to 
provide information about himself or herself and about others in his or her 
household, including which immigration-status category applied to each 
person when he or she came to this country. Answers are facilitated by a 
flash card that lists major legal immigration statuses (see fig. 4). 41 A further 
question asks whether each person obtained a green card after arriving in 
this country. The SIPP questions come close to asking about — but do not 
actually allow an estimate of — the number of foreign-born U.S. residents 



Ja GAO/GGD-98-164, p. 3. 

40 While NAWS data collections are fielded annually, results are generally reported every 
other year. See U.S. Department of Labor, Findings from the National Agricultural 
Workers Survey (NAWS) 2000-2002: A Demographic and Employment Profile of United 
States Farm Workers. Research Report 9 (Washington, D.C.: March 2005). 

41 The SIPP flash card has neither an undocumented category nor an "other status not 
listed" category. However, persons reported to have an immigration status not on the SIPP 
card — which would logically include undocumented persons as well as a small number of 
persons in various minor legal immigration categories — are tallied separately. 



Page 20 



GAO-06-775 Estimating the Undocumented Population 



who are currently undocumented. 42 According to the Census Bureau, SIPP 
is now scheduled to be "reengineered," but the full outlines of the revised 
effort have not been set. 

Figure 4: SIPP Flash Card 

CARD U 

IMMIGRATION STATUS AT TIME OF ENTRY 

1 - Immediate relative or family-sponsored 

permanent resident 

2 - Employment-based permanent resident 

3 - Other permanent resident 

4 - Granted refugee status or granted asylum 

5 - Non-immigrant (e.g., diplomatic, student, 

business, or tourist visa) 



u 



SIPP-24204 (1-16-2004) 



Source: U.S. Bureau of the Census. (The actual size of the card is 8-1/2" by 11.") 



"Although NAWS and SIPP have received OMB clearance (under the Paperwork Reduction 
Act), and although no special field problems have emerged, it is difficult to say whether 
field problems might arise in future. Reasons include question-threat and related problems 
depending, in part, on contextual factors, such as current levels of immigration 
enforcement in the nonborder areas of the United States, and the perceived relevance of 
the question to the survey. 



Page 21 



GAO-06-775 Estimating the Undocumented Population 



In the middle to late 1990s, the grouped answers question series was 
subjected to preliminary development and testing with Hispanic 
respondents, including interviews with farmworkers conducted by Aguirre 
International, under contract to GAO. 43 In these tests, every respondent 
picked a box. 44 However, these interviews were not conducted under 
conditions of a typical large-scale survey in which interviewers initiate 
contact with respondents in their homes. 45 

To further test respondents' acceptance of the grouped answers approach, 
the Census Bureau created a question module with 3-box flash cards and 
contracted for it to be added to the 2004 GSS. When presenting the survey 
to respondents, interviewers explained that NORC of the University of 
Chicago fielded the GSS survey, with "core funding" from an NSF grant. 46 
The Census Bureau's question module included cards from the three-card 
version of the grouped answers approach — which features only one 
immigration status category in Box A. The cards used were 

• the two training cards shown in figures 5 and 6 47 and 

• the immigration status card shown in figure 7. 48 



The Grouped Answers 
Approach Has Been Tested 
in Surveys Fielded by 
Private Sector 
Organizations 



The contract specified that Aguirre would provide GAO data on actual responses that had 
been "stripped of person-identifiers and related information." 

44 Additionally, GAO conducted cognitive interviews focused on testing the appropriateness 
of the icons used on the cards (see GAO/GGD-00-30, pp. 44-45). Cognitive interviewing 
focuses on the mental processes of the respondent while he or she is answering a survey 
question. The goals are to find out what each respondent thinks the question is asking, 
what the specific words or phrases (or icons on a card) mean to him or her, and how he or 
she formulates an answer. Typically, cognitive interviewing is an iterative process in which 
the findings or problems identified in each set of interviews are used to modify the 
questions to be tested in the next set of interviews. 

46 GAO/GGD-98-164 and GAO/GGD-00-30. 

46 The GSS consists of a "core" question series and additional "modules." The funding for 
fielding the core question series is provided by a grant from NSF. The modules are question 
series added through a variety of grants and contracts. 

47 An expert reviewer of a draft of this report noted that the housing types on the training 
card shown in figure 5 are not all mutually exclusive; that is, a single family house can be 
located on a farm. 

48 These cards were initially subjected to 1997-98 developmental tests conducted with more 
than 100 Hispanic immigrants who were farmworkers or in other situations such as 
applying for aid at a legal clinic specializing in immigration cases — such that a fair number 
of those interviewed seemed relatively likely to be undocumented. See GAO/GGD-00-30 
and GAO/GGD-98-164. 



Page 22 



GAO-06-775 Estimating the Undocumented Population 



Figure 5: Training Card 1 






Sources: GAO; Dominican Republic (illustrations). (The actual size ot the card is 8-1/2" by 1 1 .") 



Page 23 



GAO-06-775 Estimating the Undocumented Population 



Figure 6: Training Card 2 




Page 24 



GAO-06-775 Estimating the Undocumented Population 



Figure 7: Immigration Status Card Tested in GSS 



Legal permanent resident 

With a valid and official green 
card issued to me by the 
U.S. government 




United States 
citizen 




• 


Student, work, or 
tourist visa 








HI 


Undocumented 

1 do not have my own valid 
official green card 
















Refugee or asylee 

Without a green card \tM 









Some other category 

Not in Box A or Box B 



Sources: GAO; Corel Draw (flag and suitcase); DHS (resident alien cards). (The actual size of the card is 8-1/2" by 11.") 

Training card 1 shows different types of houses arranged in three boxes. 
Respondents are asked to indicate the type of house they lived in when in 
their home country — by picking a box. They are told that if the answer is 
in Box B, we don't need to know which specific type applies to them, 
because right now we are focusing on Box A. 



Training card 2 shows different modes of transportation, again arranged in 
three boxes. Respondents are asked to indicate the mode of transportation 
they used the most recent time they traveled from their home country to 



Page 25 



GAO-06-775 Estimating the Undocumented Population 



the United States — again by picking a box. They are again told that if it's in 
Box B, we don't need to know which specific mode applies. 

Additionally, the GSS-Census Bureau module asked interviewers to 
(1) judge respondents' understanding of the 3-box format, (2) observe 
whether respondents objected or "kept silent for a while" when presented 
with the immigration status card, and (3) record any comments that 
respondents made about the cards. As the Census Bureau has noted, the 
module was a partial test because only one immigration status card was 
tested. 

Data and documentation from this field test became available in late 2005. 
A Census Bureau analysis of these data (completed in 2006 and 
reproduced in full in appendix IV), indicates that of 237 foreign-born 
respondents, 216 (roughly 90 percent) chose a box, 4 gave other answers, 
and 17 refused or said "don't know." The Census Bureau took this "as an 
indication that most foreign-born who are asked about their migrant status 
in this format would understand the question, know the answer, and 
answer willingly." 

Further, the Census Bureau paper stated that 

• the "overwhelming majority of foreign-born respondents" picked a 
box on the immigration status card without — according to 
interviewers — any objection, hesitation, or periods of silence; 

• while some interviewers did not give a judgment or were confused 
about rating respondents' understanding, about 80 percent of 
respondents were coded as understanding and about 10 percent as 
not; 49 and 

• some respondents' comments, written in by interviewers, indicated 
that although the GSS is a "personal interview" survey, telephone 
interviews had been substituted, in some cases, and this meant that 
respondents could not see the cards — making the use of the 3-box 
format difficult. 

The Census Bureau's paper highlighted various limitations of the 2004 
GSS test, including (1) testing only one immigration status card, 



49 The Census Bureau's paper said that field representatives reported that the remaining 
respondents were in doubt and may not have understood. 



Page 26 



GAO-06-775 Estimating the Undocumented Population 



(2) underrepresenting Hispanics, and (3) in some instances interviewing 
over the telephone (instead of in person), so that respondents did not see 
the flash cards. 50 



Experts Seem to 
Accept "Grouped 
Answers" Questions If 
Fielded by a Private 
Sector Organization 



The acceptability of the grouped answers approach appears to be high, 
when implemented in surveys fielded by a university or private sector 
organization. Many immigration experts, including advocates, accepted 
the grouped answers approach, although some conditioned their 
acceptance on a quality implementation in a survey fielded by a university 
or other private sector organization. An independent statistical expert 
believed that the grouped answers approach would be generally usable 
with survey respondents. 



Keys to Acceptance Are 
Fielding by a Private 
Sector Organization, Data 
Protections, and Quality 
Implementation 



Some of the researchers and advocates we contacted were extremely 
enthusiastic about the potential for new data. No one objected to 
statistical, policy-relevant information being developed on the size, 
characteristics, costs, and contributions of the undocumented population. 
Overall, the immigration experts we contacted (listed in appendix I, table 
5) accepted the grouped-answers question approach — although advocates 
sometimes conditioned their acceptance on, for example, the questions 
being asked in a survey fielded by a university or private sector 
organization — with data protections built in. Many also offered 
suggestions for maximizing cooperation by foreign-born respondents or 
ideas about how advocacy organizations might help. 

Some advocates indicated that a key condition of their support would be 
that (1) the grouped answers question on immigration status be asked by a 
university or private sector organization and (2) identifiable data (that is, 
respondents' answers linked to personal identifiers) be maintained by that 
organization. Two advocate organizations specifically stated that they 
"could not endorse," or implied they would not support, the grouped 
answers approach, assuming the data were collected and maintained by, in 
one case, the Census Bureau and, in the other case, the government. Many 
other immigration experts and advocates preferred that grouped answers 



The Census Bureau's paper also noted that the nonresponse rate for the GSS overall (that 
is, averaged across a combination of U.S.-born and foreign-born persons selected for the 
sample) was 29.6 percent. (Persons who are selected for interview but not interviewed may 
be either native-born or foreign-born; because they were never asked and never reported 
where they were born, a specific response rate for the foreign-born cannot be calculated.) 



Page 27 



GAO-06-775 Estimating the Undocumented Population 



data on immigration status be collected by a university or other reputable 
private sector organization pledged to protect the data. 

The immigration advocates said that private sector fielding of a grouped 
answers survey and protection of such data from nonstatistical uses that 
might harm immigrants were key issues because 

• Some foreign-born persons are from countries with repressive 
regimes and thus have more fear of (less trust in) government than 
the typical U.S.-born person. 

• Despite current law protecting individual data from disclosure, 
some persons believe that information collected by a government 
agency such as the Census Bureau is routinely shared (or that in 
some circumstances it might be shared) across government 
agencies. Further, one advocate pointed out that the Congress could 
change the current law, eliminating that protection. (Although the 
grouped answers approach does not identify anyone as 
undocumented, it does provide some information regarding each 
respondent's immigration status.) 

• Extremely large-scale data collections — notably, the American 
Community Survey (ACS) — can yield estimates for areas small 
enough that if the data were publicly available, they could be used 
for nonstatistical, nonpolicy purposes. Some advocates referred to 
the World War II use of census data to identify the areas where 
specific numbers of persons of Japanese origin or descent resided. 
They also pointed out that Census Bureau data on ethnicity — 
including counts of Arab Americans — are publicly available by zip 
code. (The Census Bureau, unlike other government agencies and 
private sector survey organizations, is associated with extremely 
large-scale data collections, and some persons may not fully 
differentiate Census Bureau data collection efforts of different 
sizes.) 

• Hostility to or lack of trust in the Census Bureau might result in 
potentially lower response rates for foreign-born persons, based on 
the World War II experience of the Japanese or a more recent 
incident in which Census Bureau staff helped a DHS enforcement 



Page 28 



GAO-06-775 Estimating the Undocumented Population 



unit access publicly available data on ethnicity by zip code. 51 DHS 
stated that it did not use these data and had not requested the 
information by zip code. 52 The Census Bureau clarified its position 
on providing help to others requesting publicly available data. 53 

Various advocates saw the issues listed above as linked to their own 
acceptance, as well as to respondent acceptance, of a survey. Linking 
these issues to respondent acceptance of a survey was, in some cases, 
echoed by other immigration experts we consulted. 54 Some immigrant 
advocates and other immigration experts counseled us that if there were 
an increase in enforcement efforts in the interior of the United States (as 
opposed to border-crossing areas), foreign-born respondents' acceptance 
of the grouped answers questions would be likely to decrease — at least, if 
the questions were asked in a survey fielded by the government. 

One advocate expressly stated a preference for a grouped answers survey 
with funding by a nongovernment entity, such as a foundation. We 
discussed with a number of immigrant advocates who objected to a 
government-fielded survey the possibility of a survey fielded by a private 



See Samia El-Badry and David A. Swanson, "Providing Special Census Tabulations to 
Government Security Agencies in the United States: The Case of Arab- Americans," paper 
presented at the 25th International Population Conference of the International Union for 
the Scientific Study of Population, Tours, France, July 18-23, 2005. One advocate was 
particularly concerned about the possibility that lower respondent cooperation might have 
resulted from these incidents and, if so, might have led to underrepresentation of these 
communities in Census Bureau data. Additionally, one advocate questioned whether local 
estimates of the undocumented might, in future, facilitate possible efforts to base 
apportionment on population counts that do not include undocumented residents. We note 
that most large-scale personal-interview surveys do not include sufficient numbers of 
foreign-born respondents to allow indirect grouped answers estimates of undocumented 
persons for small geographic areas, such as zip codes. 

52 See "U.S. Customs and Border Protection Statement on Census Data," Department of 
Homeland Security, Press Office, Washington, D.C., August 13, 2004. 

53 Charles Louis Kincannon, Director, "Procedures for Providing Assistance to Requestors 
for Special Data Products Known as Special Tabulations and Extracts," memorandum to 
Associate Directors, Division Chiefs, Bureau of the Census, Washington, D.C., August 26, 
2004. 

54 It might be noted that SIPP officials told us that when the Census Bureau conducted the 
SIPP survey and asked about immigration status, interviewers did not experience field 
problems. However, SIPP asks about immigration status at the time when respondents 
came to this country (and one other question); SIPP stopped short of a specific question on 
current undocumented status — and the SIPP data do not allow indirect estimation of the 
number who are currently undocumented. 



Page 29 



GAO-06-775 Estimating the Undocumented Population 



sector organization with government funding. In some cases, we 
specifically referred to one or both of the following surveys, which 
(1) have been conducted for many years without inappropriate data 
disclosures and (2) ask direct sensitive questions: 

• the National Survey on Drug Use and Health (NSDUH), fielded by RTI 
International under a contract from HHS's Substance Abuse and Mental 
Health Services Administration (SAMHSA), and 

• the National Agricultural Workers Survey (NAWS), fielded by Aguirre 
International, under a contract from the Department of Labor. 55 

The advocates' response was generally to accept the concept of 
government funding of a university's or private sector survey 
organization's field work, provided that appropriate protections of the 
data were built into the funding agreement. 

GAO's contract with Aguirre International for early testing of the grouped 
answers approach with farmworker respondents specified that data on 
respondents' answers would be "stripped of person-identifiers and related 
information." Additionally, the GSS "core funding" grant with NSF and its 
contractual arrangements with sponsors of question modules — such as the 
grouped-answers question insert contracted for by the Census Bureau — 
do not involve the transfer of any data other than publicly available data, 
stripped of identifiers, and limited so as to avoid the possibility of 
"deductive disclosure" with respect to respondent identities or local 
areas. 

Various advocates said that their acceptance was also contingent on 
factors such as 

1. high-quality data, including coverage of persons who have limited 
English proficiency, with special attempts to reach those who are 
linguistically isolated (that is, members of households in which no one 



These two examples involve agencies that are viewed neutrally by the immigrant 
advocates we talked with. (Agencies that are viewed negatively by some immigrant 
advocates are DHS and the Census Bureau.) 

56 GSS receives funding for its core questions through a grant from NSF. GSS interviewers 
and advance letters told respondents about the NSF sponsorship. Additionally, respondents 
were told that one purpose of the survey was to inform government officials. 



Page 30 



GAO-06-775 Estimating the Undocumented Population 



14 or older speaks English "very well") and to overcome other 
potential barriers (such as cultural differences); 

2. appropriate presentation of the survey, including an appropriate 
explanation of its purpose and how respondents were selected for 
interview; and 

3. transparency — that is, keeping the immigrant community informed 
about or involved in the development and progress of the survey. 

One advocate specifically said that her organization's support would be 
contingent on both (1) the development of more information on 
respondent acceptance within the Asian community — particularly among 
Asians who have limited English proficiency or are linguistically isolated — 
and (2) a survey implementation that is planned to adequately 
communicate with Asian respondents, including those who are 
linguistically isolated or have little education. 57 Although one-fourth of the 
2004 GSS test respondents were Asian, the test was conducted in English 
(allowing help from bilingual household members), and no other tests 
have included linguistically isolated Asians. 58 



Advocates and other experts made several suggestions for maximizing 
respondent cooperation with a survey using the grouped answers question 
series — that is, maximizing response rates for such a survey as well as 
maximizing authentic participation. 

Advocates suggested that the survey (1) avoid taking names or Social 
Security numbers, 59 (2) hire interviewers who speak the respondents' 
home-country language, (3) let respondents know why the questions are 
being asked and how their households came to be selected, (4) conduct 



This would mean communication that takes account of cultural as well as language 
concerns. 

58 The 2004 GSS was limited to respondents who either were fluent in English or were 
helped by a household member who was fluent in English; some persons with limited 
English proficiency are likely to have been reached. The preliminary testing and 
development of the grouped answers approach offered a choice of Spanish or English 
interviews. However, linguistically isolated now-Hispanics have not yet been included in 
any test. 

59 Later in this report, we describe potential ways of testing whether respondents "pick the 
correct box" — ways that do not require routine collection of respondent names and Social 
Security numbers as part of the main survey. 



Advocates and Experts 
Suggest Ways to Maximize 
Respondent Cooperation 
and Offer Their Assistance 



Page 31 



GAO-06-775 Estimating the Undocumented Population 



public relations efforts, (5) obtain the support of opinion leaders, 
(6) select a survey group from a well-known and trusted university to 
collect the data, and (7) ask respondents about their contributions to the 
American economy through, for example, working and paying taxes. 

Additionally, survey experts suggested 

• using audio-Computer Assisted Self Interview (audio-CASI), 60 

• carefully explaining to respondents how anonymity of response is 
protected, and 

• paying respondents $25 or $30 for participating in the interview. 

Survey experts viewed these elements as key ways of boosting response 
rates or encouraging authentic responses to sensitive questions. For 
example, NAWS, which uses respondent incentives, achieves extremely 
high response rates within cooperating farms — 97 percent in 2002, with a 
$20 payment to farmworkers selected. 

Some immigrant advocates also offered suggestions for how their 
organizations or other advocates might help the effort to develop and field 
the grouped answers approach, including 

1. providing contacts at local organizations to help with arrangements for 
future research, 

2. developing or reviewing Box A follow-up questions, and 

3. serving on an advisory board with other representatives from 
immigrant communities. 61 



CASI, or Computer Assisted Self Interview, means that the respondent himself or herself 
uses a laptop to view the questions and flash cards and to indicate his or her answers. 
Audio-CASI adds earphones so that questions and instructions can be spoken to the 
respondent while he or she views the questions on the screen. Audio-CASI programming 
can be completed in any one of several languages. Experts told us that studies have shown 
increased reporting of sensitive items when audio-CASI is used. 

61 Two advocates mentioned positively the transparency that the Census Bureau works 
toward through outreach to immigrant-advocate organizations. This outreach includes 
explanation of data collection goals and policies. 



Page 32 



GAO-06-775 Estimating the Undocumented Population 



GSS Data and Independent 
Statistical Consultant 
Review Show "General 
Usability" of the Grouped 
Answers Approach 



As we report above, the Census Bureau's recent analysis of the 2004 GSS 
grouped answers data concluded that the "overwhelming majority of 
foreign-born respondents" picked a box without objection, hesitation, or 
silence. The Census Bureau reported, more specifically, that roughly 
90 percent (216 of 237 respondents) chose a box, 4 gave other answers, 
and 17 refused to answer or said "don't know." 



Our subsequent analysis excluded 19 of the 237 respondents in the Census 
Bureau analysis because 



• 4 were not foreign-born (for example, 1 had been born abroad to 
parents who had, by the time he was born, become naturalized U.S. 
citizens); 

• 1 was not classifiable as either foreign-born or not foreign-born 
(because he did not know whether his parents were born in the 
United States); 

• 4 others were known to have been interviewed on the telephone, 
based on written-in interviewers' comments recorded in the 
computer file (for example, one wrote that the respondent could 
not see the cards because the interview was on the telephone); and 

• 10 others were subsequently found to have been interviewed on the 
telephone, based on a special GSS hand check of the interview 
forms for respondents who had refused or said "don't know," 
which was carried out in response to our request. 62 

As a result, in our analysis we found that only 6 personally interviewed 
foreign-born GSS respondents refused or said "don't know." 63 One of the 
6 was an 18-year-old Mexican who told the interviewer that he did not 
know whether or not he was a legal immigrant. Additionally, we found that 
the 4 respondents who gave "other answers" had provided usable 



GSS Director Tom Smith graciously arranged for a hand check of interviews coded refusal 
or "don't know," thus providing key information to us in time for this report. (Specific 
mode-of-interview data for all 2004 GSS respondents will not be available until the end of 
2006.) The GSS Director also said that, overall, about 10 percent of the 2004 GSS interviews 
were conducted over the telephone. 

63 Similar numbers refused or said "don't know" on the two 3-box training cards. 
Specifically, 8 respondents refused or said "don't know" on the housing card, 6 on the 
transportation card. 



Page 33 



GAO-06-775 Estimating the Undocumented Population 



information (for example, one called out that he had a student visa) and 
thus could be recoded into an appropriate box. 

After reviewing the two analyses of the GSS test data — the one that the 
Census Bureau performed and the other we performed — Dr. Zaslavsky 
concluded that 

The test confirms the general usability of the [grouped-answers approach] with subjects 
similar to the target population for its potential large-scale use — that is, foreign-born 
members of the general population. Out of about 218 respondents meeting eligibility 
criteria and who were most likely administered the cards in person (possibly including a 
few who had telephone interviews but responded without problems), only 9 did not 
respond by checking one of the 3 boxes. Of these, 3 provided verbal information that 
allowed coding of a box, and 6 declined to answer the question altogether. Furthermore, 
several of these [6] raised similar difficulties with other 3-box questions on nonsensitive 
topics (type of house where born, mode of transportation to enter United States), 
suggesting that the difficulties with the question format were at least in part related to the 
format and not to the particular content of the answers. Thus, indications were that there 
would not be a systematic bias due to respondents whose immigration status is more 
sensitive being unwilling to address the 3-box format. 

Dr. Zaslavsky emphasized the importance of minimizing or completely 
avoiding telephone interviews when using the grouped answers 
approach — or, alternatively, providing advance copies of the cards to 
respondents before interviewing over the telephone. 64 (Dr. Zaslavsky's 
written review is presented in full in appendix III.) 



VariOUS TGStS Are Or ^ e ^ m( ^ n § s on res P° n dent acceptance — that is, the GSS test — raised 

some unanswered questions about acceptance that experts said should be 

May Be Needed addressed. Additionally, the experts said that one or more tests of 

response validity are needed to determine whether respondents "pick the 
correct box" versus systematically avoiding Box B. 



Alternatively, we believe that it might be possible to estimate the bias incurred by 
including a small number of telephone interviews in the analysis (or by eliminating them 
from the analysis). 



Page 34 



GAO-06-775 Estimating the Undocumented Population 



Questions for Further The independent reviewer of the GSS analyses (Dr. Zaslavsky) concluded 

Research Were Suggested tnat 
by the GSS Test 

four issues should be addressed in future field tests: 

(a) Equivalent acceptability of all forms of the response card, 

(b) Usability with special populations including those with low literacy, the 
linguistically isolated, and concentrated immigrant populations, 

(c) Methods that avoid telephone interviews, or reduce bias and nonresponse due to 
use of the telephone, 

(d) Use of follow-up questions to improve the accuracy of box choices. 

As the independent expert explained with respect to point (b), GSS 
undercoverage of the foreign-born population occurred at least in part 
because interviews were conducted only in English, although household 
members could help respondents with limited English. 65 Various 
colleagues and experts we talked with supported points (a) through (d). 
We further note that points (a) and (c) were covered or touched on in the 
Census Bureau's paper reporting its analysis of the 2004 GSS data. In our 
discussions with Census Bureau staff, they also mentioned that further 
tests of acceptance should include (d) follow-up questions for Box A 
respondents. 

Additionally, some advocates and an immigration researcher suggested 
improving the cards, which might minimize the potential for "don't know" 
or inaccurate answers. A survey expert suggested using focus groups to 
further explore respondent perceptions of the cards — and to potentially 
improve them. 66 

Earlier testing covered a key portion of the populations (Hispanic 
farmworkers) cited in (b) above, was conducted in Spanish, and included 



65 Questions were asked and answers were apparently given in English. 

66 The pretesting and cognitive testing conducted on the cards so far has been limited to 
certain groups of Hispanics. We believe that testing with other groups, potentially including 
focus group testing, could be important before large-scale implementation. It also might be 
appropriate to change specific categories and definitions of statuses on the cards, 
depending on future changes in laws. 



Page 35 



GAO-06-775 Estimating the Undocumented Population 



Box A follow-up questions as recommended in (d) above. 67 In those 
interviews, every respondent picked a box. However, 

1. No language other than Spanish or English has been used in testing; 
thus, as one immigrant advocate pointed out, no testing has focused on 
linguistically isolated Asians (those living in households in which no 
adult member speaks English). 

2. The interviews with Hispanic farmworkers were not conducted under 
typical conditions of a household survey. 

3. Only one immigration status card was tested with Hispanic 
farmworkers and in the GSS. 

Therefore, we agree that the acceptance-testing issues the experts raised 
should be considered in assessing the grouped answers approach. 



Studies Should Test 
Whether Respondents Pick 
the Correct Box 



Several experts told us that tests of respondent accuracy — or at least 
respondents' intent to respond accurately — should be conducted. These 
experts emphasized that grouped answers data would not be useful if 
substantial numbers of respondents were to systematically avoid picking 
Box B (that is, to not pick the box with the undocumented category). 
However, one immigration study expert believed that if a response validity 
study involved lengthy delays, fielding a grouped answers survey should 
proceed in advance of a validity study. 

We agree with the experts' position that tests are needed to determine 
whether respondents systematically avoid Box B (even after Box A follow- 
up check questions). Tests of response validity would ideally be conducted 
with the methods of encouraging truthful answers that experts mentioned, 
such as (1) explaining why the survey is being conducted, how the 
respondent was selected, and how the anonymity of answers is ensured, 
and (2) using audio-CASI and, if appropriate, paying respondents for 
participating in the interview. And, as the Census Bureau pointed out, such 
a study should include the full grouped answers question series, including 
follow-up questions, and it should test both Card 1 and Card 2. Even if 
small numbers of respondents were to respond inaccurately, it would be 
helpful to estimate this and adjust for any resulting bias. 



In fact, a key part of the earlier testing focused on the development of icons to help 
respondents with limited literacy. 



Page 36 



GAO-06-775 Estimating the Undocumented Population 



We discussed various approaches to conducting validity studies with 
immigration experts, including immigrant advocates, and with agencies 
conducting surveys. In reviewing these approaches, we found that 
response validity tests vary according to whether they are conducted 
before, during, or after a survey is fielded. 

Before a large-scale survey is conducted. The grouped answers 
question series could be asked of a special sample of respondents for 
whom the answers are known, in advance, by study investigators on an 
individual-respondent basis. Such knowledge might be based, for example, 
on information that recent applicants for green cards have submitted to 
DHS. 68 "Firewalls" could be used to prevent survey information from being 
given to DHS. We discussed this approach with DHS; however, experts 
criticized a DHS-based validity study on both methodological and public 
relations grounds. 69 An alternative source of data on individuals' 
immigration statuses might avoid these problems, but no alternative 
source has yet been identified. 

Before or as part of a large-scale survey. In either situation (that is, in 
a presurvey study or as part of a survey), respondents could be asked if 
they would be willing to participate in special validity-test activities in 
return for a payment of, say, $25 or $30 for each activity. Later, after 
interviewing had been completed in a given location — not as part of the 
interview process — a sample of respondents who chose Box A (that is, 
those who claimed to be here legally) could be asked to 

• participate in a focus group in which respondents would discuss 
how they felt answering the grouped answers questions when the 
interviewer came to their house and, also, could possibly be asked 
to fill out a "secret ballot" indicating whether they had answered 
authentically in the earlier home interview; 

• give permission for a record check and provide information that 
could subsequently be used in a record check (for example, their 



NCHS has suggested that some kind of validity test at the individual level is needed. 
Interviewing persons whose status is known in advance is a classic approach. 

69 0ne expert scoffed at a validity test limited to persons whose immigration status is 
known to DHS. An immigrant advocate pointed to the issues that arose when the Census 
Bureau helped DHS obtain publicly available information on ethnicity by zip code; she 
indicated that a public relations problem could result even if only carefully crafted, 
carefully protected sharing of information took place. 



Page 37 



GAO-06-775 Estimating the Undocumented Population 



name, date of birth, and Social Security number) and permission to 
check these data with the Social Security Administration; 70 or 

• show his or her documentation (for example, green card) to a 
documents expert. 71 

These checks would logically be focused on Box A respondents, for most 
of whom such checks would be less threatening. We believe that it is 
reasonable to assume that most respondents who chose Box B picked the 
correct box. Further, because the survey interview states that there are no 
more questions on immigration if the respondent picks Box B, pursuing 
follow-up validity checks might be deemed inappropriate for Box B 
respondents. 72 

After data are collected. With a large-scale survey, it would be possible 
to conduct comparative analyses after the data were collected. We provide 
three examples. 73 

1. Grouped answers estimates of the percentage undocumented could be 
compared for (a) all foreign-born versus (b) high-risk groups, such as 
those who arrived in the United States within the past 5 or 10 years. 
The expectation would be that with valid responses, a higher estimate 



' One immigrant advocacy organization pointed out that it would be important in such a 
study to protect the data so that the agency checking records (in this instance, the Social 
Security Administration) could not discover information about any identifiable respondent. 
Protective approaches might include (1) using code numbers and a "third party" model and 
(2) adding numerous "fake" cases to the checklist and notifying the agency that this was 
being done. (See GAO, Record Linkage and Privacy: Issues in Creating New Federal 
Research and Statistical Information. GAO-01-126SP (Washington, D.C.: April 2001).) 

71 The ideas for these approaches are an outgrowth of our discussions concerning NSDUH 
with SAMSHA. The NSDUH project officer said that as part of that survey (which is fielded 
by RTI International in Research Triangle Park, N.C., under a contract with SAMHSA), a 
sample of respondents were offered $25 for a hair sample and $25 for a urine sample. 
Ninety percent of those offered the incentive payments provided one or both samples. 

' 2 It would be important to craft such a study so that respondents would not be tempted to 
distort information in order to receive payment. One immigrant advocate suggested asking 
"what other experience federal agencies have had with paying a select group of 
respondents to participate in a validity test" to determine "whether the payment approach 
is considered scientifically sound." One way of addressing this concern might be to offer all 
or some Box B respondents a "minimal threat" follow-up opportunity, such as participating 
in a focus group, which could also be associated with a payment. 

73 0ther possible comparative analyses might also be useful. DHS suggested comparisons to 
results from the Latin American Migration Project and the New Immigrant Survey. 



Page 38 



GAO-06-775 Estimating the Undocumented Population 



of the percentage undocumented would be obtained for those who 
arrived more recently — because, for example, persons who had arrived 
recently were not here during the amnesty in the late 1980s. 74 

2. Comparisons could be made of (a) Box A estimates of specific legal 
statuses and the approximate dates received — notably, the numbers of 
persons claiming to have received valid green cards in 1990 or more 
recently — with (b) publicly available DHS reports of the numbers of 
green cards issued from 1990 to the survey date. 75 

3. Analysts could compare (a) grouped answers estimates of the number 
undocumented overall to (b) estimates of total undocumented 
obtained by the residual method. 76 

Wherever possible, Card 1 and Card 2 should be tested separately for 
accuracy of response. 

The advantage of conducting a validity study in advance of a survey is that 
if significant problems surface, adjustments in the approach can be made. 
Or if the problems are substantial and cannot be easily corrected — and if 
the anticipated survey were to be fielded mostly or only to collect grouped 
answers data — then that survey could be postponed or canceled. However, 
the results of validity tests conducted during or after a survey can be used 
to interpret the data and, potentially, to adjust estimates if it appears that, 
for example, 5 to 10 percent of undocumented respondents had 
erroneously claimed to be in Box A of Card 1. As one expert noted, 



74 This is a version of the standard "known groups" validity test — an approach that NCHS 
suggested using if it is not possible to conduct individual checks. 

,5 An expert in immigration studies suggested this test. As DHS's comments indicate, such a 
test would involve adjusting the DHS figures on, for example, the number of green cards 
issued in specific years to account for subsequent return-migration and mortality, as well 
as taking account of survey undercoverage. For information on adjustments needed in 
comparisons involving green cards, see Nancy F. Rytina, Estimates of the Legal Permanent 
Resident Population and Population Eligible to Naturalize in 2004 (Washington, D.C.: 
Department of Homeland Security, Office of Immigration Statistics, February, 2006), p. 3, 
table 2. For an analogous comparison for U.S. citizenship, see Jeffrey S. Passel, Rebecca L. 
Clark, and Michael Fix, "Naturalization and Other Current Issues in U.S. Immigration: 
Intersections of Data and Policy," in Proceedings of the Social Statistics Section of the 
American Statistical Association: 1997 (Alexandria, Va.: American Statistical Association, 
1997). 

76 This test was suggested by another expert in immigration studies. Residual estimates are 
based primarily on comparing (1) administrative data on the number of legal immigrants 
with (2) census counts or survey estimates of the number of foreign-born residents who 
have not become U.S. citizens. 



Page 39 



GAO-06-775 Estimating the Undocumented Population 



conducting an advance study does not preclude conducting a subsequent 
study during or after the survey. 



Although several factors are involved, and it is not possible to guarantee a 
specific level of precision in advance, we estimate that roughly 
6,000 foreign-born respondents, or more, would be needed for a grouped 
answers survey. 77 As we explain below, this is based on (1) a precision 
requirement (that is, a 95 percent confidence interval consisting of plus or 
minus 3 percentage points), (2) assumptions about the sampling design of 
the survey in which the questions are asked, and (3) the assumption that 
approximately 30 percent of the foreign-born population is currently 
undocumented. 

An indirect grouped answers estimate of the undocumented population 
generally requires interviews with more foreign-born respondents than a 
corresponding hypothetical direct estimate would — assuming it were 
possible to ask such questions directly in a major national survey. One 
key reason is that the main sample of foreign-born respondents must 
be divided into two subsamples. Half the respondents answer each 
immigration status card. On this basis alone, one would have to double the 
sample size required for a direct estimate based on a question asked of all 
respondents. Further, the estimate of undocumented, which is achieved by 
subtraction, combines two separate estimates, each characterized by some 
degree of uncertainty. 78 

Determining the number of respondents required for a "reasonably 
precise" estimate of the percentage of the foreign-born population who are 
undocumented involves three key factors: 



Some 6,000 Foreign- 
Born Respondents 
Are Needed for 
"Reasonably Precise" 
Estimates of the 
Undocumented 



A sample of foreign-born is contained within a general sample of the household 
population. As we explain in a later section of this report, an efficient way to survey the 
foreign-born is by piggybacking on an existing, ongoing large-scale survey of the total 
household population, which includes foreign-born persons — if an appropriate ongoing 
survey can be identified. A higher-cost alternative would be to identify a new sample of the 
total household population and screen (by mini-interviews conducted by telephone or in 
person or both) for households that contain one or more foreign-born persons. 

78 The size of the error associated with a grouped answers estimate relative to a direct 
estimate depends on the distribution of immigration statuses. Assuming that 33.3 percent 
of foreign-born persons are in the undocumented category, 33.3 percent are in the set of 
legal statuses in Card 1, Box A, and 33.3 percent are in the set in Card 2, Box A, we would 
expect the error associated with a grouped answers estimate of the percentage 
undocumented to be twice that associated with a corresponding direct estimate. 



Page 40 



GAO-06-775 Estimating the Undocumented Population 



1. specification of a precision level — that is, choice of a 90 percent or 
95 percent confidence level and an interval defined by plus or minus 
2, 3, or 4 percentage points; 

2. information on (or assumptions about) the sampling design for the 
main survey and for subsamples 1 and 2; and 

3. to the extent possible, consideration of the likely distribution of the 
foreign-born population across immigration status categories, 
including the various legal categories and the undocumented 

category. 79 

With respect to the first factor involved in determining sample size, some 
agencies — for example, the Census Bureau and the Bureau of Labor 
Statistics (BLS) — use the 90 percent confidence level. Other agencies use 
the 95 percent level. 

With respect to the second factor, the sampling design of a large-scale, 
nationally representative, personal-interview survey is based on 
probabilistic area sampling rather than simple random sampling of 
individuals. This often reduces the precision of estimates (relative to 
simple random sampling). 80 The reason is that persons selected for 
interview are clustered in a limited number of areas or neighborhoods 
(and residents of a particular neighborhood may tend to be similar). It is 
possible that the design for selecting subsamples 1 and 2 could increase 
precision; however, it is not possible to predict by how much. 81 



79 If there is no information on the distribution of immigration status, then a potentially very 
large sample size would be estimated, based on a "worst case scenario" distribution. 
However, if there is information, this may allow a given level of precision to be attained 
with a smaller sample. 

80 To illustrate how this occurs in practice, referring to the National Health Interview Survey 
(NHIS), NCHS told us that an estimate of the percentage of persons who are foreign-born, 
18 to 39 years old, and U.S. citizens is characterized by a variance that is roughly 1.6 times 
the variance that would be associated with a corresponding estimate based on simple 
random sampling. (In theory, a complex sampling design could reduce the variance rather 
than increasing it.) 

81 The independent statistical consultant (Dr. Zaslavsky) advised us that rotating the use of 
immigration status cards 1 and 2 in every other household interviewed (balancing the use 
of alternative cards within areas or clusters) might increase precision. The logic is that 
because some areas are defined by factors such as income and ethnicity — which might be 
related to immigration status — rotation would help ensure balance on these factors. 



Page 41 



GAO-06-775 Estimating the Undocumented Population 



With respect to the third factor, existing residual estimates point to a fairly 
even 3-way split between three main categories — undocumented, U.S. 
citizen, and legal permanent resident. However, there is some uncertainty 
associated with these estimates, the distribution may vary across 
subgroups, and the percentages may change in future. 82 Therefore, a range 
of distributions is relevant. 

Taking each of these factors into account (to the extent possible) and 
using conservative assumptions, we estimated the approximate numbers 
of respondents required for indirect estimates of the undocumented 
population that are "reasonably precise." 

Table 1 shows required sample sizes for the 90 percent confidence level, 
table 2 for the 95 percent level, with precision at plus or minus 2, 3, and 
4 percentage points. In estimating these required sample sizes, we made 
conservative assumptions and specified a range of possibilities for the 
distribution with respect to the undocumented category. 



To identify a single, rough figure for the sample size needed for reasonably 
precise estimates, we focused on 

1. the 95 percent level, which is more certain and, we believe, preferable; 

2. the 30 percent column, because a current residual estimate of the 
undocumented population is in this range; and 

3. the middle row (for plus or minus 3 percentage points), which is a 
midpoint within the area of "reasonable precision" as defined above. 

With this focus, we estimate that roughly 6,000 or more respondents would 
be required. 83 



For example, it is possible that new immigration laws would allow large numbers of 
currently undocumented persons to legalize their status. 

83 We believe these are reasonable choices but we realize that others might focus on, for 
example, more precise estimation (plus or minus 2 percentage points). 



Page 42 



GAO-06-775 Estimating the Undocumented Population 



Table 1 : Approximate Number of Foreign-Born Respondents Needed to Estimate Percentage Undocumented within 2, 3, or 4 
Percentage Points at 90 Percent Confidence Level, Using Two-Card Grouped Answers Data 



Estimate within 2, 3, or 4 


Percent undocumented foreign-born (range of possibilities) 




percentage points 


10% 


30% 


50% 


70% 


90% 


2 


10,700 


9,900 


8,100 


5,500 


2,100 


3 


4,800 


4,400 


3,600 


2,500 


900 


4 


2,700 


2,500 


2,000 


1,400 


500 



Source: GAO analysis. 



Note: Estimated numbers of respondents were calculated assuming that (1) foreign-born persons 
who are not undocumented are evenly split between the legal statuses in Box A, Card 1 , and Box A, 
Card 2 (a conservative assumption in that it maximizes the required number of respondents), 
(2) sample selection design for the main survey and for subsamples 1 and 2 increases the variance of 
an estimate of undocumented by 1 .6 (which does not build in potential reductions in variance that 
might occur with a careful design for the selection of subsamples 1 and 2); and (3) for simplicity, no 
respondents choose Box C. 



Table 2: Approximate Number of Foreign-Born Respondents Needed to Estimate Percentage Undocumented, within 2, 3, or 4 
Percentage Points, at 95 Percent Confidence Level, Using Two-Card Grouped Answers Data 


Estimate within 2, 3, or 4 


Percent undocumented foreign-born (range of possibilities) 


percentage points 10% 


30% 


50% 


70% 90% 


2 15,200 


14,000 


1 1 ,500 


7,800 2,900 


3 6,800 


6,200 a 


5,100 


3,500 1 ,300 


4 3,800 


3,500 


2,900 


2,000 700 



Source: GAO analysis. 



Note: Estimated numbers of respondents were calculated assuming that (1) foreign-born persons 
who are not undocumented are evenly split between the legal statuses in Box A, Card 1 , and Box A, 
Card 2 (a conservative assumption in that it maximizes the required number of respondents), 
(2) sample selection design for the main survey and for subsamples 1 and 2 increases the variance of 
an estimate of undocumented by 1 .6 (which does not build in potential reductions in variance that 
might occur with a careful design for the selection of subsamples 1 and 2); and (3) for simplicity, no 
respondents choose Box C. 

"This is the approximate number of foreign-born respondents needed for an overall estimate of the 
percentage undocumented with a confidence interval of plus or minus 3 percentage points at the 
(preferred) 95% confidence level, assuming that 30% of the foreign-born are undocumented. 

High-risk subgroups — subgroups with higher percentages of 
undocumented (such as adults 18 to 44 and persons who arrived in the 
United States within the past 10 years) — would require fewer respondents 
for the same level of precision, as illustrated in the tables' middle and right 
columns. For example, if about 70 percent of a subgroup were 
undocumented, a survey with about 3,500 respondents in that subgroup 
would produce an estimate of the percentage of the subgroup that is 



Page 43 



GAO-06-775 Estimating the Undocumented Population 



undocumented, correct to within approximately plus or minus 
3 percentage points at the 95 percent confidence level. 

Low precision could obtain for smaller subgroups in which there are 
relatively few undocumented persons (for example, 10 percent or less), 
particularly if — as assumed in tables 1 and 2 — there is an even split of 
legally present foreign-born persons across the Box A categories of 
immigration status cards 1 and 2. 84 

The independent statistician we consulted indicated that if more than one 
grouped answers survey is conducted, combining data across two or more 
surveys could help provide larger numbers of respondents for subgroup 
analysis. For example, if a large-scale survey were conducted annually, 
analysts could combine 2 or 3 years of data to obtain more precise 
estimates. (One caveat is that combining data from multiple survey years 
reduces the time-specificity associated with the resulting estimate.) 

Finally, we note that to estimate the numerical size of the undocumented 
population, 

• A grouped answers estimate of the percentage of the foreign-born who 
are undocumented would be combined with a census count of the 
foreign-born or an updated estimate. For example, the 2000 census 
counted 31 million foreign-born persons, and the Census Bureau later 
issued an updated estimate of 35.7 million for 2005. 

• The specific procedure would be to multiply the percentage 
undocumented (based on the grouped answers data and the 
subtraction procedure) by a census count or an updated estimate of the 
foreign-born population for the year in question. 

The precision of the resulting estimate of the numerical size of the 
undocumented population would be affected by (1) the precision of the 
grouped answers percentage estimate, which is closely related to sample 
size, as described above, and (2) any bias in the census count or updated 
estimate of the foreign-born population. 85 The precision of the grouped 



84 However, if the percentage undocumented overall were to sharply decrease, it might be 
appropriate to change the groupings on the cards to mitigate this factor. 

85 Such bias might arise from problems in accurately covering the foreign-born population. 
An additional caveat is that coverage of the undocumented may be lower than coverage of 
other foreign-born persons. We examined coverage issues in GAO/GGD-98-164. 



Page 44 



GAO-06-775 Estimating the Undocumented Population 



answers percentage is taken into account by using a percentage range 
(for example, the estimate plus or minus 3 percentage points) when 
multiplying. Although the amount of bias in a census count or updated 
estimate is unknown, we believe that any such bias would have a 
proportional impact on the calculated numerical estimate of the 
undocumented population. 86 

To illustrate the proportional impact, we assume that a census count for 
total foreign-born is 5 percent too low. Using that count in the 
multiplication process would cause the resulting estimate of the size of the 
undocumented population to be 5 percent lower than it should be. 87 The 
situation is analogous for subgroups. 88 

Overall, it seems clear that reasonably precise grouped answers estimates 
of the undocumented population and its characteristics require large-scale 
data collection efforts but not impossibly large ones. 



The Most Efficient 
Field Strategy Does 
Not Seem Feasible 



A low-cost field strategy would be to insert the new question series in an 
existing, nationally representative, large-scale survey — that is, to pose the 
grouped answers questions to the foreign-born respondents already being 
interviewed. However, based on our review of on-going large-scale 
surveys, the insertion strategy does not seem feasible. Specifically, we 
identified four potentially relevant surveys but none met criteria based on 
the grouped answers design and other criteria based on immigrant 
advocates' concerns. 



86 This assumes that the census count or updated estimate is a constant. 

87 Suppose hypothetically that an updated estimate for some future year estimates the 
foreign-born population as 40 million and that a grouped answers estimate of the 
percentage of foreign-born who are undocumented is 30 percent. Multiplying 40 million by 
30 percent would yield an estimate of 12 million undocumented (hypothetical data). 
Further suppose that the true size of the foreign-born population, in that future year, were 
actually 42 million. Multiplying 42 million by 30 percent would yield 12.6 million — 
a result just 5 percent higher than 12 million. 

88 In contrast, analysts have pointed to a potentially disproportionate, magnifying impact of 
bias in census counts (or error in updated estimates of the size) of the foreign-born 
population on residual estimates of the number who are undocumented. See Kenneth Hill, 
"Estimates of Legal and Unauthorized Foreign-Born Population for the United States and 
Selected States Based on Census 2000," presentation at the U.S. Census Bureau 
Conference, Immigration Statistics: Methodology and Data Quality, Alexandria, Virginia, 
February 13-14, 2006. Siegel and Swanson (p. 479) make a similar point. 



Page 45 



GAO-06-775 Estimating the Undocumented Population 



The dollar costs associated with inserting a grouped answers module are 
difficult to calculate in advance because many factors are involved. 
However, to suggest the "ball park" within which the cost of a grouped 
answers insert might be categorized, if an insertion were possible, we 
present the following two examples. 

• The GSS test, in which a grouped answers question module was 
inserted, cost approximately $100 per interview (more than 

200 interviews were conducted). On average, the question series took 
3.25 minutes. Logically, per-interview costs are likely to be higher in 
relatively small surveys than in larger surveys with thousands of 
foreign-born respondents. 

• For the much larger Current Population Survey (CPS), with interviews 
covering native-born and foreign-born persons in more than 

50,000 households, the Census Bureau and BLS told us that "an average 
10-minute supplement cost $500,000 in 2005." 89 This implies $10 per 
interview at the 50,000 level, but per-interview costs might be higher 
when the question series applied to only a portion of the respondents. 
Additional costs might apply for flash cards and foreign-language 
interviews. BLS noted that still other costs would apply for advance 
testing and subsequent analyses requested by the customer. 

A more costly option would be to ask the grouped answers question series 
in a follow-back survey of foreign-born respondents identified in 
interviewing for an existing survey. (In-person self-report interviews can 
cost $400 to $600 each.) More costly still would be the development of a 
new, personal-interview survey of a representative sample of the foreign- 
born population devoted to migration issues; the main reason is that there 
would be additional costs in "screening out" households without foreign- 
born persons. 

We identified four potentially relevant ongoing large-scale surveys. All 
have prerequisites and processes for accepting (or not accepting) new 
questions. We also developed six criteria for assessing the appropriateness 
of each survey as a potential vehicle for fielding the grouped answers 
approach. Three criteria are based on design requirements, and three are 
based on the views of immigrant advocates. We found that no ongoing 
large-scale survey met all criteria. 



89 More than 6,000 of these households included one or more foreign-born persons. 

Page 46 GAO-06-775 Estimating the Undocumented Population 



Four Ongoing Large-Scale 
Data Collections 
Sometimes Accept 
Additional Questions 



We identified four nationally representative, ongoing large-scale surveys in 
which respondents are or could be personally interviewed. 90 Three of these 
conduct most or all interviews in person: 

1. the Current Population Survey (CPS), sponsored by BLS and the 
Census Bureau and fielded by Census; 



2. the National Health Interview Survey (NHIS), sponsored by the 

National Center for Health Statistics (NCHS) and fielded by the Census 
Bureau; and 



3. the National Survey on Drug Use and Health (NSDUH), sponsored by 
SAMHSA and fielded by RTI International, a private sector contractor. 



The fourth survey is the American Community Survey (ACS), a much 
larger survey fielded by the Census Bureau and using "mixed mode" data 
collection. The majority of the data are based on mailed questionnaires or 
telephone interviews, with the remaining data based on personal 
interviews. In addition, there is one personal-interview follow-back survey 
that uses the ACS frame and data to draw its sample. 91 Other follow-back 
surveys might eventually be possible. 

For any of these four surveys, inserting a new question or set of questions 
(or fielding a "follow-back" survey based on respondents' answers in the 
main survey) requires approvals by the Office of Management and Budget 
(OMB), the agencies that sponsor or field the surveys, and in cases in 
which data are collected by a private sector organization, the 
organization's institutional review board. 

The prerequisites for an ongoing survey's accepting new questions 
typically include low anticipated item nonresponse, pretesting and pilot 



A fifth survey, SIPP, a large-scale in-person survey, is scheduled to be "reengineered" to 
provide an "effective alternative to the current SIPP." It is anticipated that administrative 
data will be combined with survey data, although the exact directions that the revised 
effort will take are not yet known. (We defined large-scale as 50,000 or more interviews, 
including native-born and foreign-born respondents. The foreign-born represent about 
12 percent of the national population, implying that a survey of 50,000 U.S. residents could 
be expected to collect data on roughly 6,000 foreign-born persons.) 

91 This follow-back survey concerns alcohol use and alcoholism; it is sponsored by the 
National Institute of Alcohol Abuse and Alcoholism. OMB told us that, in part because ACS 
is a new survey, very few other follow-up efforts, if any, are likely to be approved in the 
next few years. 



Page 47 



GAO-06-775 Estimating the Undocumented Population 



testing (including debriefing of respondents and interviewers) that 
indicate a minimum of problems, review by stakeholders to determine 
acceptability, and tests that indicate no effect on either survey response 
rates or answers to the main survey's existing questions. 92 Another 
prerequisite would be the expectation of response validity. 93 

Additionally, multiple agencies mentioned a need for prior "cognitive 
interviewing," compatibility with existing items (so that there is no need to 
change existing items), and no significant increase in "respondent burden" 
(by, for example, substantially lengthening the interview). 94 

Agencies sponsoring or conducting large-scale surveys varied on the 
perceived relevance of immigration to the main topic of their survey. For 
example, BLS noted that some of its customers would be interested in data 
on immigration status by employment status (among the foreign-born), 
and the Census Bureau has indicated the relevance of undocumented 
immigration to population estimation. But some other agencies saw little 
relevance to the large-scale surveys they sponsored or conducted. 
Resistance to including a grouped answers question series might occur 
where an agency perceives little or no benefit to its survey or its 
customers. 

Additionally, one agency raised the issue of informed consent, which we 
discuss in appendix V. 



For example, with respect to possible impacts on answers to main-survey questions, 
SAMHSA (which sponsors the NSDUH) indicated a concern that asking about immigration 
status might make respondents less likely to provide honest answers to questions about 
illegal behaviors such as drug use (potentially because of fear of such actions as 
deportation). 

93 As we discussed in a previous section, experts told us that it is important to demonstrate 
that respondents, especially undocumented respondents, "pick the correct box" — or at 
least to demonstrate that they intend to pick the correct box (rather than avoiding Box B). 

94 Cognitive interviewing focuses on the mental processes of the respondent while he or she 
is answering a survey question. The goals are to find out what each respondent thinks the 
question is asking, what the specific words or phrases (or icons on a card) mean to him or 
her, and how he or she formulates an answer. Typically, cognitive interviewing is an 
iterative process in which the findings or problems identified in each set of interviews are 
used to modify the questions to be tested in the next set of interviews. 



Page 48 



GAO-06-775 Estimating the Undocumented Population 



No Ongoing Large-Scale Based on the design of the grouped answers approach, as tested to date, 
Data Collection Met Our t wo criteria for an appropriate survey are (1) personal interviews in which 
Criteria respondents can view the 3-box cards and (2) a self-report format in which 

questions ask the respondents about their own status (rather than asking 
one adult member of a household to report information on others). A third 
criterion is that the host survey not include highly sensitive direct 
questions that could affect foreign-born respondents' acceptance of the 
grouped answers questions. 95 We based these criteria on the results of the 
GSS test, our knowledge of the grouped answers approach, and general 
logic. 

As shown in table 3, one of the surveys we reviewed (the CPS) does not 
meet the self-report criterion; that is, it accepts proxy responses. Two 
other surveys (the NHIS and NSDUH) do not meet the criterion of an 
absence of highly sensitive questions, since they include questions on HIV 
status (NHIS) and the use of illegal drugs (NSDUH). Conducting a follow- 
back survey based on ACS would meet all three criteria. 96 



°For example, if a respondent had already admitted engaging in a behavior related to 
illegal activity, he or she might be less likely to accurately answer a question on 
immigration status. Of course, if future testing were to indicate that a particular type of 
sensitive item did not affect immigration responses, this criterion would be dropped. 

96 The ACS is a mixed-mode rather than a solely personal-interview survey. It gathers 
information on all members of a household based, in some cases, on a single adult 
respondent-informant rather than randomly selecting one or more respondents in each 
household and asking them to provide information about themselves. However, one follow- 
back personal interview survey has based its sample selection on the ACS frame and its 
data. We further note that if a follow-back survey based on the CPS could be conducted, 
then — provided that the follow-back was designed for self-report personal interviews — it 
would meet the criteria in table 3. 



Page 49 



GAO-06-775 Estimating the Undocumented Population 



Table 3: Survey Appropriateness: Whether Surveys Meet Criteria Based on the Grouped Answers Design 



Three design-based criteria 



Survey 
type 



Specific survey 



1. Are the data gathered in 
personal interviews? 



2. Are all respondents 
selected to self-report? 



3. Are direct questions nor 
highly sensitive? 



Ongoing 
survey 



Current Population Survey 
(CPS) 



YES. Mostly, for in-person 
waves; 16% of foreign-born 
interviewed by telephone, in 
the in-person waves. 3 



No. An adult respondent 
reports on self and provides 
proxy responses for others 
in his or her household. In- 
person data for 6,744 
households with 1 or more 
foreign-born members 
(2006). 



YES, not highly sensitive. 



National Health Interview 
Survey (NHIS) 



YES. Mostly; 17% of 
foreign-born sample adults 
interviewed by telephone. 



YES. For some questions, 
but not all, 4,829 foreign- 
born adults self-report 
(2004). 



No. There are direct 
questions on HIV, other 
STDs. c 



National Survey of Drug Use 
and Health (NSDUH) 



YES. All interviewed in 
person. 



YES. 7,364 foreign-born age 
12 and older and 4,934 
foreign-born age 18+ self- 
report (2004). 



No. There are direct 
questions on respondent's 
use and sale of drugs like 
marijuana and cocaine. 



Potential Potential American 
follow-back Community Survey (ACS) 
survey follow-back survey, by the 
Census Bureau — on all or a 
sample of all foreign-born on 
whom ACS data were 
collected 



YES. A follow-back could 
specify personal 
interviews only. (ACS is 
mixed mode, mostly mail.) 



YES. A follow-back could 
specify self -report only. 

(ACS data include both self- 
report data and proxy data 
in which one member of a 
household provides 
responses for others.) 



YES, not highly sensitive. 



Source: GAO analysis. 

"The CPS includes successive data collections or "waves" to update data over time, at selected 
households. In some waves, interviews are conducted in person; in others, by telephone. 

"Based on the core CPS questionnaire. (Different modules or supplements may be added in particular 
survey years or CPS waves.) 

C HIV refers to human immunodeficiency virus. STDs refers to sexually transmitted diseases. 



The views of immigrant advocates, which were echoed by some other 
experts, suggested three additional criteria for a candidate "host" survey: 

1. data collection by a university or private sector organization, 

2. no request for the respondent's name or Social Security number, and 

3. protection from possible release of grouped answers survey data for 
small geographic areas (to guard against estimates of the 
undocumented for such areas). 



Page 50 



GAO-06-775 Estimating the Undocumented Population 



The experts based their views on (1) methodological grounds (foreign- 
born respondents would be more likely to cooperate, and to respond 
truthfully, if all or some of these criteria were met) and (2) concerns about 
privacy protections at the individual or group levels. 97 These criteria are 
potentially important, in part because the success of a self-report 
approach hinges on the cooperation of individual immigrants and, most 
likely, also on the support of opinion leaders in immigrant communities. 98 
With respect to the first criterion above, we note that with the exception of 
initial GAO pretests, all tests of the grouped answers approach have 
involved data collection by a university or private sector organization. 
Without further tests, we do not know whether acceptance would be 
equally high in a government-fielded survey. 

As shown in table 4, an ACS follow-back would potentially not meet any 
of the three criteria based on immigrant advocates' views. Only one survey 
(NSDUH) met all three criteria based on immigrant advocates' views — and 
because of its sensitive questions on drug use, that survey did not meet the 
design-based table 3 criteria. 



With respect to the individual level, Census Bureau staff told us that they are extremely 
careful not to disclose information, that such disclosure is prohibited by law, and that the 
Census Bureau explains this to respondents. However, they also said that some 
respondents erroneously believe that all government agencies share information with one 
another or might do so under certain circumstances. 

98 We note that the relevance of the criteria in table 4 would likely be heightened if interior 
enforcement efforts (that is, those conducted away from border areas) were to sharply 
increase. 



Page 51 



GAO-06-775 Estimating the Undocumented Population 



Table 4: Survey Appropriateness: Whether Surveys Meet Table 3 (Design Based) Criteria and Additional Criteria Based on 
Immigrant Advocates' Views 



Three additional criteria based on immigrant advocates' views 











2. Are interviews 










1 . Does a 


anonymous (that is, 


3. Is sample too small 






Meets all table 3 


nongovernment 


no names or Social 


for reliable small-area 






(design based) 


organization conduct 


Security numbers are 


estimates of 


Survey type 


Specific survey 


criteria 


field work? 


taken)? 


undocumented?" 


Ongoing 


Current Population 


No. 


No. The Census 


No. Takes names. 


YES. 


survey 


Survey (CPS) 




Bureau conducts field 












work. b 








National Health 


No. 


No. The Census 


No. Takes both names 


YES. 




Interview Survey 




Bureau conducts field 


and Social Security 






(NHIS) 




work. c 


numbers. 






National Survey of 


No. 


YES. 


YES. 


YES. 




Drug Use and Health 












(NSDUH) 










Potential 


Potential American 


YES. 


No. Only the Census 


No. Takes names in the 


Potentially, no. A follow- 


follow-back 


Community Survey 




Bureau can conduct 


initial survey, and a 


back might be 




(ACS) follow-back 




field work. 


follow-back would be 


extremely large. (Also, 




survey by the Census 






based on knowing each 


small-area releases are 




Bureau — on all or a 






person's identity. 


not prohibited by law or 




sample of foreign-born 








policy.) 




on whom data were 












collected 











Source: GAO analysis. 



Note: Table 3 criteria are personal interviews; respondent reports on himself or herself; no highly 
sensitive direct questions. 

"For this report, we define "small area" as below the county level. 

"For CPS, only the Census Bureau can conduct a follow-back. 

Tor NHIS, a follow-back by a private sector organization might be possible. 

In conclusion, we did not find a large-scale survey that would be an 
appropriate vehicle for "piggybacking" the grouped answers question 
series. 



Page 52 



GAO-06-775 Estimating the Undocumented Population 



Observations ^ or more * nan a decade, the Congress has recognized the need to obtain 

reliable information on the immigration status of foreign-born persons 
living in the United States — particularly, information on the 
undocumented population — to inform decisions about changing 
immigration law and policy, evaluate such changes and their effects, and 
administer relevant federal programs. 

Until now, reliable data on the undocumented population have seemed 
impossible to collect. Because of the "question threat" associated with 
directly asking about immigration status, the conventional wisdom was 
that foreign-born respondents in a large-scale national survey would not 
accept such questions — or would not answer them authentically. 



Testing So Far Affirms Using the grouped answers approach to ask about immigration status 

That the Grouped Answers seems promising because it reduces question threat and is statistically 
Approach Is Promising logical. Additionally, this report has established that 

• The grouped answers approach is acceptable to most foreign-born 
respondents tested (thus far) in surveys fielded by private sector 
organizations; it is also acceptable — with some conditions, such as private 
sector fielding of the survey — to the immigrant advocates and other 
experts we consulted. 

• A variety of research designs are available to help check whether 
respondents choose (or intend to choose) the correct box. 

• The grouped answers approach requires a fairly large number of personal 
interviews with foreign-born persons (we estimate 6,000) to achieve 
reasonably precise indirect estimates of the undocumented population 
overall and within high-risk subgroups. 

However, the most cost-efficient method of fielding a grouped answers 
question series — piggybacking on an existing survey — does not seem 
feasible. Rather, fielding the grouped answers approach would require a 
new survey focused on the foreign-born. This raises two new questions 
about "next steps" — and the answers depend, in large part, on policymaker 
judgments, as described below. 



Page 53 



GAO-06-775 Estimating the Undocumented Population 



Two New Questions about Question 1: Are the costs of a new survey justified by information 

"Next Steps" needs? DHS stated (in its comments on a draft of this report) that the 

"information on immigration status and the characteristics of those 
immigrants potentially available through this method would be useful for 
evaluating immigration programs and policies." The Census Bureau has 
indicated that information on the undocumented would help estimate the 
total population in intercensal years. And an expert reviewer emphasized 
that a new survey of the foreign-born would be likely to help estimate the 
total population." 

Additionally, policymakers might deem a new survey of the foreign-born to 
be desirable for other reasons than obtaining grouped answers data. 
Notably, an immigration expert who reviewed a draft of this report 
pointed out that a survey focused on the foreign-born might provide more 
in-depth, higher-quality data on that population than existing surveys that 
cover both the U.S.-born and foreign born populations. For example, more 
general surveys, such as the ACS and CPS (1) ask a more limited set of 
migration questions than is possible in a survey focused on the foreign- 
born, (2) are not designed with a primary goal of maximizing participation 
by the foreign-born (for example, are not conducted by private sector 
organizations), and (3) as DHS pointed out in comments on a draft of this 
report, may not be designed to cover persons who are only temporarily 
linked to sampled households, because such persons may have arrived 
only recently in the United States and are temporarily staying with 
relatives. 100 

A new survey aimed at obtaining grouped answers data on immigration 
status would require roughly 6,000 (or more) personal, self-report 
interviews with foreign-born adults. Other in-person, self-report interviews 
in large-scale surveys have cost $400 to $600 each. A major additional cost 



"This expert reviewer told us: "One of the biggest issues surrounding immigration is the 
scale of in- and out-migration. The failure to understand this process is one of the biggest 
reasons that the population estimates were so far off at the time of the 2000 census. A 
survey devoted to the foreign-born could be especially helpful in ensuring that we have the 
best weights [information on population] possible, particularly if the survey could 
accurately estimate illegal aliens." 

100 The ACS defines residence in a household as living there for 2 months (either completed 
or ongoing). For a discussion of other quality issues in the ACS, see Steven A. Camarota 
and Jeffrey Capizzano, "Assessing the Quality of Data Collected on the Foreign Bom: An 
Evaluation of the American Community Survey (ACS): Pilot and Full Study Findings," 
Immigration Studies White Papers, Sabre Systems Inc., April 2004. 
http://www.sabresys.com/whitepapers/CIS_whitepaper.pdf (Sept. 6, 2006). 



Page 54 



GAO-06-775 Estimating the Undocumented Population 



would be obtaining a representative sample of foreign-born persons; this 
would likely require a much larger survey of the general population in 
which "mini-interviews" would screen for households with one or more 
foreign-born individuals. 

We did not study the likely costs of such a data collection or options for 
reducing costs. However, survey costs can be estimated (based on, for 
example, the experience of survey organizations), and policymakers can, 
in future, weigh those costs against the information need — keeping in 
mind the results of research on the grouped answers approach, to date, 
and experts' opinions on research needed. 

Question 2: What further tests of the grouped answers method, if 
any, should be conducted before planning and fielding a new 
survey? On one hand, advance testing could 

• assess response validity (that is, whether respondents pick — or intend 
to pick — the correct box) before committing funds for a survey and in 
time to allow adjustments to the question series; 

• further delineate respondent acceptance and explore the impact on 
acceptance of factors such as government funding — or funding by a 
particular agency — in order to inform decisions about whether or how 
to conduct a survey; 101 and 

• as suggested in DHS's comments on a draft of this report, help 
determine the cost of a full-scale survey. 102 

On the other hand, extensive advance testing would likely delay the 
survey-and may not be needed because 



Potentially, the prospects for private sector funding could be explored. One question 
would be whether it is possible to identify a willing private sector source that is not aligned 
with a particular perspective on immigration issues. 

102 Alternatively, survey costs can be estimated — albeit more roughly — on the basis of the 
experience of survey organizations. 



Page 55 



GAO-06-775 Estimating the Undocumented Population 



• response validity could be assessed — and respondent acceptance could 
be further delineated — concurrently with or subsequent to the survey 
rather than in advance, 103 

• the need for advance testing of response validity would be lessened if 
policymakers see a need for more or better survey data on the foreign- 
born additional to the need for grouped answers data on immigration 
status (see discussion in question 1, above); 

• the value of advance testing would be lessened if changes in 
immigration law and policy occurred between the time of an advance 
test and the main survey, because such changes could affect the 
context in which the survey questions are asked and, hence, change the 
operant levels of acceptance and validity; and 

• survey costs can be estimated — albeit more roughly — on the basis of 
the experience of survey organizations. 

Given the arguments for and against advance testing, it seems appropriate 
for these to be weighed by policymakers. 



ASenCV Comments ^ e P rov ided a draft of this report to and received comments from the 

° ^ Department of Commerce, the Department of Homeland Security, and the 

Department of Health and Human Services (see appendices VII, VIII, and 
IX, respectively). The Office of Management and Budget provided only 
technical comments, and the Department of Labor did not comment. 

The Census Bureau agreed with the report's discussion of 

• the grouped answers method, including its strengths and 
limitations; 

• the Census Bureau-GSS evaluation, including the conclusions of the 
independent consultant (Alan Zaslavsky); and 

• the need for a "validity study" to determine whether the grouped 
answers method can "generate accurate estimates" of the 
undocumented population. 



Validity tests conducted concurrent with the survey and follow-on checks that compare 
survey results against (adjusted) administrative information would seem to be appropriate, 
if a survey is, in fact, fielded. 



Page 56 



GAO-06-775 Estimating the Undocumented Population 



The Census Bureau also provided technical comments, which we used to 
clarify the report, as appropriate. 

The Department of Homeland Security stated that the kinds of information 
that the grouped answers approach would provide, if successfully 
implemented, would be useful for evaluating immigration programs and 
policies. DHS further called for pilot testing by GAO to assess the 
reliability of data collection and to help estimate the costs of an eventual 
survey. 104 As we indicate in the "observations" section of this report, two 
key decisions for policymakers concern 

• whether to invest in a new survey and 

• whether substantial testing is required in advance of planning and 
fielding a survey. 

We believe that depending on the answers to these questions, another 
issue — one we cannot address in this report — would concern identifying 
the most appropriate agency for conducting or overseeing (1) tests of the 
grouped answers and (2) an eventual survey of the foreign-born 
population. However, we believe that conducting or overseeing such tests 
or surveys is a management responsibility and, accordingly, is not 
consistent with GAO's role or authorities. DHS made other technical 
comments which we incorporated in the report where appropriate. 105 

The Department of Health and Human Services (HHS) agreed that the 
NSDUH would not be an appropriate vehicle for a grouped answers 
question series. Commenting on a draft of this report, HHS said that the 
report should include more information on variance calculations and on 



DHS suggested that the pilot testing be conducted within a limited geographic area. 

105 For example, DHS pointed to the issue of an existing survey (the American Community 
Survey) defining residence in a household as living there for 2 months (either completed or 
ongoing). DHS said this would likely exclude some unauthorized and temporary migrants 
and indicated that, if a new survey needs to be conducted, it should be designed to cover all 
foreign-born persons residing here. 



Page 57 



GAO-06-775 Estimating the Undocumented Population 



"mirror-image" estimates. 106 Therefore, we (1) added a footnote illustrating 
the variance costs of a grouped answers estimate relative to a 
corresponding direct estimate and (2) developed appendix VI, which gives 
the formula for calculating the variance of a grouped answers estimate and 
discusses "mirror image" estimates. 

Additionally, HHS said that interviewers should more accurately 
communicate with respondents when presenting the three-box cards. We 
believe that the text of appendix V on informed consent, based on our 
earlier discussions with privacy experts at the Census Bureau, deals with 
this issue appropriately. As we state in appendix V, it would be possible to 
explain to respondents that "there will be other interviews in which other 
respondents will be asked about some of the Box B categories or 
statuses." Finally, HHS made other, technical comments, which we 
incorporated in the report, as appropriate. 

The Office of Management and Budget provided technical comments. In 
addition, our discussions with OMB prompted us to re-order some of the 
points in the "observations" section of the report. 

The Department of Labor informed us that it had no substantive or 
technical comments on the draft of the report. 



We are sending copies of this report to the Director of the Census Bureau, 
Secretary of Homeland Security, Secretary of Health and Human Services, 
Secretary of Labor, Director of the Office of Management and Budget, and 
to others who are interested. We will also provide copies to others on 
request. In addition, the report will be available at no charge on GAO's 
Web site at http://www.gao.gov. 



A grouped answers estimate of the percentage of the foreign born who are 
undocumented can be defined as the percentage of subsample 1 who are in Box B, Card 1, 
minus the percentage of subsample 2 who are in Box A, Card 2. Alternatively, a grouped 
answers estimate could be defined as the percentage of subsample 2 who are in Box B, 
Card 2, minus the percentage of subsample 1 who are in Box A, Card 1. If both calculations 
are performed and two estimates are derived, they might be termed "mirror image" 
estimates. 



Page 58 



GAO-06-775 Estimating the Undocumented Population 



If you or your staff have any questions regarding this report, please call me 
at (202) 512-2700. Contact points for our Offices of Congressional 
Relations and Public Affairs may be found on the last page of this report. 
Other key contributors to this assignment were Judith A. Droitcour, 
Assistant Director, Eric M. Larson, and Penny Pickett. Statistical support 
was provided by Sid Schwartz, Mark Ramage, and Anna Maria Ortiz. 




Nancy R. Kingsbury, Managing Director 
Applied Research and Methods 



Page 59 



GAO-06-775 Estimating the Undocumented Population 



Appendix I: Scope and Methodology 



To gain insight into the acceptability of the grouped answers approach, we 
discussed the approach with numerous experts in immigration studies and 
immigration issues, including immigrant advocates. Table 5 lists the 
experts we met with and their organizations. 



Table 5: Experts GAO Consulted on Immigration Issues or Immigration Studies 


Mama and title* 
INdlllc dllU MMcr 


Hrnari 173+ S n 
Ul yell 1 l£dUUI 1 


fitpvpn A f^punarntp Dirpotor of Rp^parrh 


Opntpr for Imminration Stiirlip^ 

vCI IICI IVJI 1 1 1 II 1 1 1 VJ 1 CIMVsl 1 


Rnhprt Dpp^v Dirprtor 1 iai^on 3nrl Information 

1 lUUCI L 1 > CUC y , 1 ' 1 1 CLilUI , 1 lulOUl 1 CX\ 1 KJ IIIIWIIIICILIWII 


Amprippn Imminration 1 aw/v/pr^ Association 3 


orysiai Williams, uepuiy Director 




J. Traci Hong, Director of Immigration Program 


Asian American Justice Center 8 


Terry M. Ao, Director of Census and Voting Programs 




Guillermina Jasso, Professor of Sociology 


New York University 


Benjamin E. Johnson, Director of Policy, Immigration Policy Center 


American Immigration Law Foundation 3 


John L. (Jack) Martin, Director, Special Projects 


Federation for American Immigration Reform 


Julie Kirchner, Deputy Director of Government Relations 




Douglas S. Massey, Professor of Sociology and Public Affairs 


Princeton University 


Mary Rose Oakar, President 


American-Arab Anti-Discrimination Committee 8 


Thomas A. Albert, Director of Government Relations 




Leila Laoudji, Deputy Director of Legal Advocacy 




Kareem W. Shora, Director, Legal Department and Policy 




Demetrios G. Papademetriou, President 


Migration Policy Institute 


Jeffrey S. Passel, Senior Research Associate 


Pew Hispanic Center 


Eric Rodriguez, Director, Policy Analysis Center 


National Council of La Raza a 


Michele L. Waslin, Director, Immigration Policy Research 




Helen Hatab Samhan, Executive Director 


Arab American Institute Foundation 8 


James J. Zogby, President 


Arab American Institute 8 



Rebecca Abou-Chedid, Government Relations and Policy Analyst 
Nidal M. Ibrahim, Executive Director 



Source: GAO. 

Note: Other immigration experts we briefly consulted with by telephone or e-mail or in conversations 
at an immigration conference included George Borjas, Professor of Economics and Public Policy, 
Harvard University; Georges Lemaitre, Directorate for Employment, Labour, and Social Affairs, 
Organisation for Economic Co-operation and Development, Paris, France; Enrico Marcelli, Assistant 
Professor of Economics, University of Massachusetts at Boston; Randall J. Olson, Director, Center 
for Human Resource Research, The Ohio State University; and Michael S. Teitelbaum, Vice 
President, Alfred P. Sloan Foundation, New York. 

Organization advocating for immigrants or expressly dedicated to representing their views. We call 
such organizations immigrant advocates, although some may not, for example, lobby for legislation. 



Page 60 



GAO-06-775 Estimating the Undocumented Population 



Appendix I: Scope and Methodology 



To ensure that we identified immigration experts from varied perspectives, 
we consulted Demetrios G. Papademetriou, who is among the immigration 
experts listed in table 5, and Michael S. Teitelbaum, Vice President of the 
Alfred J. Sloan Foundation. With respect to immigrant advocates, we 
sought to include advocates who represented (1) immigrants in general, 
without respect to ethnicity; (2) Hispanic immigrants, as these are the 
largest group of foreign-born residents; (3) Asian American immigrants, as 
these are also a large group; and (4) Arab American immigrants, as these 
have been the target of interior (that is, nonborder) enforcement efforts in 
recent years. 

To determine what the 2004 General Social Survey (GSS) test indicated 
about the acceptability of grouped answers questions to foreign-born 
respondents and its "generally usability" in large-scale surveys, we 
obtained the Census Bureau's report of its analysis of those data, and we 
assessed the reliability of the GSS data through a comparison of answers 
to interrelated questions. Then we 

• submitted the Census Bureau's report of its analysis to Dr. Alan 
Zaslavsky, an independent expert, for review; 

• developed our own analysis of the GSS data and submitted our paper 
describing that analysis to the same expert; 1 and 

• summarized the expert's conclusions and appended his report and the 
Census Bureau's report (reproduced in appendixes III and IV), as well 
summarizing our conclusions. 2 

We used these procedures to ensure independence, given that the GSS test 
was based on our earlier recommendation that the Census Bureau and the 



lr The independent review considered the Census Bureau and GAO analyses of the GSS data 
in terms of (1) their overall reasonableness and thoroughness, given the general objective 
(describing respondents' acceptance and understanding), (2) key points of difference 
(if any) between the two analyses or differences in conclusions, (3) whether the analyses 
raised unanswered questions that should be addressed, and (4) whether the conclusions 
appeared to be justified. The reviewer was also free to comment on other aspects of the 
analyses. 

2 We believe this report independently addresses respondent acceptability because we 

(1) focus on the results of the GSS test (rather than critiquing the Census Bureau's work), 

(2) report how the method performed rather than subjectively assessing its merit, and 

(3) relied on an independent expert. 



Page 61 



GAO-06-775 Estimating the Undocumented Population 



Appendix I: Scope and Methodology 



Department of Homeland Security (DHS) test the grouped answers 
approach. 3 

To describe additional research that might be needed, we outlined the 
grouped answers approach and reviewed the main conclusions of the GSS 
test in meetings with the immigration experts listed in table 5 and with 
private sector statisticians. 4 Additionally, we discussed the approach with 
various federal officials and staff at agencies responsible for fielding large- 
scale surveys. 5 

To assess the precision of indirect estimates, we addressed questions to 
Dr. Zaslavsky, developed illustrative tables showing hypothetical 
calculations under specified assumptions, and subjected those tables to 
review. 

To identify and describe candidate surveys for piggybacking the grouped 
answers question series, we set minimum criteria for consideration 
(nationally representative, mainly or only in-person interviews, and data 
on at least 50,000 persons overall, including native-born and foreign-born). 
Then we identified surveys that met those criteria, collected documents 
concerning the surveys, and interviewed officials and staff at federal 
agencies that sponsored or conducted those surveys. We also talked with 
experts in immigration about additional key criteria for selecting an 
appropriate survey. 

The scope of our work had several limitations. We did not attempt to 
collect new data from foreign-born respondents in a survey, focus group, 
or other format. We did not assess census or survey coverage of the 



3 DHS contributed to the funding of the Census Bureau's contract with the National Opinion 
Research Center (NORC) for the insertion of a module (question series) into the GSS. 

4 We consulted with Alan Zaslavsky, Fritz Scheuren, and Mary Grace Kovar. 

"In our earlier work, we consulted with numerous other private sector experts on 
immigration and statistics. For those experts, see GAO/GGD-00-30, p. 29. 



Page 62 



GAO-06-775 Estimating the Undocumented Population 



Appendix I: Scope and Methodology 



foreign-born or undocumented populations. 6 We did not assess 
nonresponse rates among foreign-born or undocumented persons selected 
for interview. We did not review alternative methods of obtaining 
estimates of the undocumented. 

While we consulted a number of private sector experts and sought to 
include a range of perspectives, other experts may have other views. 
Finally, we do not know to what extent the broad range of persons who 
compose immigrant communities share the views of the immigrant 
advocates we spoke with. 



In 1998, we recommended that the Commissioner of the Immigration and Naturalization 
Service (INS) and the Director of the Census Bureau "devise a plan of joint research for 
evaluating the quality of census and survey data on the foreign-born," based on our 
discussion of the need to evaluate coverage and possible methods for doing so (see 
GAO/GGD-98-164). This recommendation is still open. In 2002, Census Bureau staff 
assumed that 15 to 20 percent of the undocumented were not enumerated in the 1990 
census and stated the belief that coverage of this group improved in the 2000 census. 
(See Joseph Costanzo and others, "Evaluating Components of International Migration: 
The Residual Foreign-Born," Population Division Working Paper 61, U.S. Census Bureau, 
Washington, D.C., June 2002, p. 22.) However, the Census Bureau has not quantitatively 
estimated the coverage of either the foreign-born population overall or the undocumented 
population. 



Page 63 



GAO-06-775 Estimating the Undocumented Population 



Appendix II: Estimating Characteristics, 
Costs, and Contributions of the 
Undocumented Population 



Logically, grouped answers data can be used to estimate subgroups of the 
undocumented population, using the following procedures: 

1. isolate survey data for (a) the subsample 1 respondents who are in the 
desired subgroup, based on a demographic or other question asked in 
the survey (for example, if the survey included a question on each 
respondent's employment, data could be isolated for foreign-born who 
are employed), and (b) subsample 2 respondents in that subgroup; 

2. calculate (a) the percentage of the subsample 1 subgroup respondents 
who are in each box of immigration status card 1 and (b) the 
percentage of subsample 2 subgroup respondents who are in each box 
of immigration status card 2; and 

3. carry out the subtraction procedure (percentage in Box B, Card 1, 
minus percentage in Box A, Card 2), thus estimating the percentage of 
the subgroup who are undocumented. 

The resulting percentage can be multiplied by a census count or an 
updated estimate of the foreign-born persons who are in the subgroup (for 
example, multiply the estimate of the percentage of employed foreign-born 
who are undocumented by the census count or updated estimate of the 
number of employed foreign-born). 

These steps can be repeated to indirectly estimate the size of the 
undocumented population within various subgroups defined by activity, 
demographics, and other characteristics (such as those with or without 
health insurance) that are asked about in the survey. Without an extremely 
large survey, it would be difficult or impossible to derive reliable estimates 
for subgroups with few foreign-born persons or few undocumented 
persons. Ongoing surveys conducted annually have sometimes combined 
2 or 3 years of data in order to provide more reliable estimates of low- 
prevalence groups; however, there is a loss of time-specificity. 



Program cost data are sometimes available on an average per-person 
basis, and surveys sometimes ask about benefit use. In such cases, the 
total costs of a program associated with a certain group can be estimated. 
Program costs associated with the undocumented population might be 
estimated by either (1) multiplying the estimated numbers of 
undocumented persons receiving benefits by average program costs or 
(2) performing the following procedures: 



Key Characteristics 
Can Be Estimated 



Some Program Costs 
Can Be Estimated 



Page 64 



GAO-06-775 Estimating the Undocumented Population 



Appendix II: Estimating Characteristics, 
Costs, and Contributions of the 
Undocumented Population 



1. Isolate survey data for all foreign-born subsample 1 respondents who 
said they were in Box B of Card 1 and estimate each individual 
respondent's program cost. 1 Then aggregate the individual costs to 
estimate the total program cost (potentially, millions or billions of 
dollars) associated with the population of foreign-born persons defined 
by the group of immigration statuses in Box B, Card 1. 

2. Isolate data for all foreign-born subsample 2 respondents who said they 
were in Box A of Card 2 and, as above, estimate each individual 
respondent's program costs, aggregating these to estimate the total 
program costs associated with the population of foreign-born persons 
defined by the immigration statuses in Box A, Card 2 (again, potentially 
millions or billions of dollars). 

3. Because the only difference between the immigration statuses in Box 
B, Card 1, and Box A, Card 2, is the inclusion of the undocumented 
status in Box B, Card 1, start with the total program cost estimate for 
all Box B, Card 1, respondents and subtract the corresponding cost 
estimate for Box A, Card 2, respondents. 

The result of the subtraction procedure represents an indirect estimate of 
program costs associated with the undocumented population. A more 
precise cost estimate can be obtained by calculating an additional "mirror 
image" cost estimate — this time, starting with costs estimated for 
respondents in Box B of Card 2 and subtracting costs associated with 
respondents in Box A of Card 1 . The two "mirror image" estimates could 
then be averaged. 

The key limitations on such procedures are sample size and the 
representation of key subgroups — for example, foreign-born respondents 
residing in small states and local areas. Thus, for example, it is possible 
that state-level costs associated with undocumented persons might be 
estimated with reasonable precision for a large state or city with many 
foreign-born persons and a relatively high percentage of undocumented 
(potentially, California or New York City) but not for many smaller states 



Estimation of program costs associated with an individual respondent (or those in very 
refined subgroups) is sometimes calculated based on a combination of (1) answers to 
specific questions (such as whether the person is attending public school in the school 
district where he or she lives or how many emergency room visits he or she made) and 
(2) separately available information on program costs per individual (for example, the per- 
pupil costs of public education in specific school districts or the per-visit costs of 
emergency room care). 



Page 65 



GAO-06-775 Estimating the Undocumented Population 



Appendix II: Estimating Characteristics, 
Costs, and Contributions of the 
Undocumented Population 



or areas, unless very large samples (or samples focused on selected areas 
of interest) were drawn. Further work could explore the ways that 
complex analyses could be conducted to help delineate costs. 



Contributions Might 
Be Estimated 



Contributions can be conceptualized as contributions to the economy 
through work or, potentially, through taxes paid. Such contributions might 
be estimated by combining grouped answers data with other survey 
questions to estimate relevant subgroups, such as employed 
undocumented persons. In complex analyses, these data could potentially 
be combined with other data to help estimate taxes paid. 



Logically, Estimates 
Can Be Made of 
Undocumented 
Children 



Other Estimates May 
Be Possible 



Logically, other quantitative estimates might be obtained through 
procedures similar to those outlined above for estimating program costs. 
For example, the numbers of children in various immigration statuses 
might be estimated by asking an adult respondent how many foreign-born 
children (or how many foreign-born school-age children) reside in the 
household and then — using the 3-box card assigned to the adult 
respondent — asking how many of these children are in Box A, Box B, and 
Box C. 2 We note that, thus far, testing has not asked respondents to report 
children's immigration status with the grouped answers approach. 

If subsamples 1 and 2 are sufficiently large, it might also be possible to 
estimate the portion of the undocumented population represented by 

• "overstays" who were legally admitted to this country for a specific 
authorized period of time but remained here after that period 
expired (without a timely application for extension of stay or 
change of status) 3 and 



2 Potentially, based on the location of the responding household, state and local per-pupil 
school costs could be obtained. Totaling state and local school costs for foreign-born 
children in each box would be followed by a group-level subtraction. In this way, the costs 
of schooling undocumented immigrant children could be estimated — nationally and 
potentially for key states — without ever categorizing any child as undocumented and 
without ever estimating the number of undocumented children in any school district. 

3 See GAO, Overstay Tracking: A Key Component of Homeland Security and a Layered 
Defense, GAO-04-82 (Washington, D.C.: May 21, 2004). 



Page 66 



GAO-06-775 Estimating the Undocumented Population 



Appendix II: Estimating Characteristics, 
Costs, and Contributions of the 
Undocumented Population 



• currently undocumented persons who are applicants for legal 
status and are waiting for DHS to approve (or disapprove) their 
application. 

To estimate overstays would require a separate question on whether the 
respondent had entered the country on a temporary visa. 4 To estimate 
undocumented persons with pending applications would require a 
separate question concerning pending applications for any form of legal 
status (including, for example, applications for U.S. citizenship as well as 
applications for legal permanent resident status and other legal statuses). 

The precision of such estimates would depend on factors such as sample 
size, the percentages of foreign-born who came in on temporary visas or 
who have pending applications of some kind, and the numbers of 
undocumented persons within these groups. 



4 See Judith A. Droitcour and Eric M. Larson, "An Innovative Technique for Asking Sensitive 
Questions: The Three-Card Method," Bulletin de Methodologie Sociologique, 75 (July 2002): 
5-23. 



Page 67 



GAO-06-775 Estimating the Undocumented Population 



Appendix III: A Review of Census Bureau and 
GAO Reports on the Field Test of the 
Grouped Answer Method 



A Review of Census Bureau and GAO Reports on the 
Field Test of the Grouped Answer Method 

Alan Zaslavsky 
Harvard Medical School 
July 8, 2006 

A field test of the "Grouped Answer Method" (GAM) for estimating the number of 
undocumented immigrants was conducted by the National Opinion Research Center (NORC) in 
the context of the 2004 General Social Survey (GSS). A descriptive report on this test was 
prepared by the Bureau of the Census and a further report by the Government Accountability 
Office (GAO). This is a review of these two documents, focusing on what is shown by the 
analyses and what questions remain to be answered. (The Census Bureau report refers to the 
method as the "Three Card Method" (3CM), but in fact the method could be implemented with 
two or three different card forms.) 

Major findings 

General usability: The test confirms the general usability of the GAM with subjects similar to 
the target population for its potential large-scale use, that is, foreign-born members of the general 
population. Out of about 218 respondents meeting eligibility criteria and who were most likely 
administered the cards in person (possibly including a few who had telephone interviews but 
responded without problems), only 9 did not respond by checking one of the 3 boxes. Of these, 3 
provided information, verbal information that allowed coding of a box, and 6 declined to answer 
the question altogether. Furthermore, several of these raised similar difficulties with other 3-box 
questions on nonsensitive topics (type of house where born, mode of transportation to enter 
United States), suggesting that the difficulties with the question format were at least in part 
related to the format and not to the particular content of the answers. Thus indications were that 
there would not be a systematic bias due to respondents whose immigration status is more 
sensitive being unwilling to address the 3-box format. 

Telephone administration: Of 232 otherwise eligible respondents, 14 were identified as 
telephone respondents. Of these, 10 were identified because they were followed up in tracking 
data after failing to provide usable information in response to the GAM item. While it is not 
known how many interviews were done by telephone altogether, the number is believed to be 
only a relatively small fraction of the entire survey. Thus, item nonresponse was largely a 
problem of telephone interviewing. The higher nonresponse rate for telephone interviewees was 
not surprising given the complexity of the response format (6 categories grouped into 3 boxes), 
the reliance of the item on the visual metaphor of boxes, the use of graphics to assist in 
remembering the categories, and the difficulty of comprehending the categories verbally and 
remembering the groupings while answering. In particular, the way in which the 3-box method 
conceals the sensitive responses would be much less obvious in a telephone interview. 
Unfortunately NORC was unable at the present time to tell exactly how many telephone 
interviews were administered altogether, so an item nonresponse rate among telephone 
interviews could not be calculated. (NORC plans to disclose individual data on mode of 
interview (telephone versus in-person) by the end of 2006, which will make possible calculation 
of item response rates by response mode, mail versus telephone.) However, it seems likely for 



Page 68 



GAO-06-775 Estimating the Undocumented Population 



Appendix III: A Review of Census Bureau and 
GAO Reports on the Field Test of the 
Grouped Answer Method 



the reasons mentioned, as well as from the concentration of problems in telephone interviews, 
that the success rate of the method for telephone respondents would be much lower than for in- 
person respondents. In future implementations of this method it would be crucial to address this 
issue, either by (1) attaching the question to a survey that makes relatively little use of telephone 
interviews, or by (2) sending a card to the respondent in advance of the interview that could be 
referred to for visual cues for the item. If these solutions were not practical, then it might be 
possible to develop a verbal form of the item adapted to telephone use, but this would require 
some laboratory testing. 

Limitations of this study 

Single card form: An important limitation of the NORC field test is that only one card form was 
tested. This was very understandable as a design limitation in the test since implementation of a 
multiform protocol adds to the complexity of implementation of a study and might well be 
judged to be excessively burdensome for a supplementary item. Nonetheless this means that this 
test cannot answer questions about differential rates of nonresponse or procedural difficulties in 
responding to the items. It is also likely that even with multiple forms, this test would have been 
underpowered to answer more refined questions about differential rates of nonresponse. With 
only 9 nontelephone item nonrespondents, a split sample comparison would have had power to 
detect only the most extreme differences in nonresponse rate. However, it is reasonable to 
generalize about the comprehensibility of the items from this test, even with a single form, since 
the modification of rearranging the options in boxes would not be expected to affect the usability 
of the question. 

GSS coverage limitations: GSS coverage had some limitations that made the test 
unrepresentative of the target population of foreign-born. Compared to rates estimated from the 
Current Population Survey, the foreign-born are undercovered by the GSS (8.4% in the GSS 
versus 14.5% in the CPS), with particular undercoverage of recent immigrants and those from 
Latin America. The CPS itself likely undercovers recent immigrants, particularly the 
undocumented, so the undercoverage problem might be even greater than revealed by 
comparison to the CPS. Of course, by the same token, the CPS and other existing surveys are 
likely to be affected by undercoverage to some extent. Special methods might be required to 
cover concentrations of immigrant population that include high rates of undocumented 
immigrants. The main concern in relation to the conclusions of the field test is whether the 
performance of the items, that is their acceptability and comprehensibility, would be different 
either in these special populations or with special method used to target these populations. 
Within the GSS test, the problem cases were not notably concentrated among recent immigrants 
or those with more limited English proficiency. This suggests that the methods of the GAM did 
not rely on highly culturally specific references or potentially confusing language. However, 
within a community that is largely made up of undocumented immigrants, even a "mixed" box 
might be regarded as more identifying and therefore sensitive than in a more heterogeneous 
community. For example in a migrant labor camp in which there are few citizens, identifying 
oneself as "citizen or undocumented immigrant" (as opposed to a noncitizen with legal status) 
might be regarded as tantamount to admitting illegal status, while this would not be the case in a 
general population. 

English only: Another concern is the use of English only in the GSS. Many of the issues here 
are similar to those identified in relation to undercoverage of recent immigrants in the preceding 
paragraph. Indeed the restriction to English-speaking respondents might explain some of the 



Page 69 



GAO-06-775 Estimating the Undocumented Population 



Appendix III: A Review of Census Bureau and 
GAO Reports on the Field Test of the 
Grouped Answer Method 



undercoverage of recent immigrants noted above. The additional issue raised specifically by 
English is whether the instructions are clear in other languages. It might be expected, however, 
that because the format of the item is largely graphical, it would not be highly sensitive to 
translation. 

Questions for further study 

Equivalence of acceptability of the alternative response cards: As noted above, only one form 
of the response card was tested in the GSS implementation. Future studies should use all (two or 
three) alternative versions of the card, to evaluate whether item nonresponse is equivalent for all 
of the forms, indicating comparable acceptability of the forms. 

Effects of nonresponse and incorrect responses on estimates: The effect of problems of 
nonresponse and noncomprehension on the quality of estimates from the GAM depends critically 
on the exact form they take, not just on the percentage of responses that are missing or invalid. 
If the group that does not respond to the item is the same regardless of which card form is used, 
then the effect of nonresponse can be understood as simple undercoverage of that nonrespondent 
group. Thus within the respondents the analysis proceeds as if with complete data and the 
unknowns only concern the characteristics of the nonrespondents, a group whose size is known. 
The effects of nonresponse can be bounded by assuming alternatively that none or all of the 
nonrespondents are undocumented immigrants. These extremes might be implausible, especially 
if qualitative information about the nonrespondents (like that collected in the GSS test, or 
potentially relationships of nonresponse to characteristics from larger implementations) suggests 
that the nonrespondents do not generally look like undocumented immigrants. Such an argument 
could be used to develop plausible tighter bounds on the fraction of undocumented immigrants 
overall. A simple assumption would be that the nonrespondents have a similar fraction of 
undocumented immigrants to respondents, which would allow use of the respondents to make 
estimates for the entire population. 

If nonresponse depends on which card is presented, the analysis of the implications is somewhat 
more complex, since not only the size of the nonrespondent group but also its distribution across 
categories could depend on the card. Note that the latter effect would not be evident if 
nonresponse rates overall are the same across cards. For a simple example, suppose that 10% of 
citizens would decline to respond to the card that groups citizens with undocumented 
immigrants, but would respond when citizens are ungrouped. Suppose that legally resident 
noncitizens behave similarly. Then the boxes including undocumented immigrants would be 
reduced by 10% with either card, reducing the estimate of undocumented immigrants by the 
same amount even if all the undocumented immigrants responded accurately. Many other such 
scenarios could be constructed. Thus it would be useful to study in larger samples the factors 
associated with refusal to respond, particularly to investigate whether the reasons given by the 
respondents seem to be associated with the grouping on the card. The evidence from the GSS 
test, however, do not point in the direction of complex nonresponse patterns like those 
hypothesized in this paragraph. 

Finally, similar issues apply with respect to response errors (responding but checking the wrong 
box). A number of possibilities must be considered. If a subgroup of legal immigrants 
systematically report the wrong immigration status (for example legal immigrants authorized to 
work in the United States who check the box for citizens) but this is unaffected by the grouping 
of categories, this will have no effect on the estimates for the undocumented. This might be the 



Page 70 



GAO-06-775 Estimating the Undocumented Population 



Appendix III: A Review of Census Bureau and 
GAO Reports on the Field Test of the 
Grouped Answer Method 



case, for example, if some of these respondents are misinformed about their own status or 
confused about the meaning of the categories. However, if they systematically avoid the box for 
the undocumented (checking that for citizens or legal noncitizen immigrants as the case may be), 
this will tend toward underestimation of the undocumented. If some undocumented immigrants 
systematically misreport their status, this will also create biases in the estimates, especially if 
they systematically avoid the box containing undocumented status. The GSS study does not 
address this issue. 

Effects of mode and mode alternatives: The GSS results support the view that the multiple-card 
items are usable with in-person interviews but more problematical with telephone interviews. 
Some questions of interest include the following: 

(1) Can the problems with telephone surveys be remedied by sending a response card before 
the interview? What would the effect of such a card be on rates of difficulties in 
telephone interviews? 

(2) Is there potential for use of mail as a response mode for GAM surveys? A mail survey 
would benefit from the same graphical presentation as with the card used in person, but 
there would be no opportunity to explain the question further to respondents who were 
confused by the format. However, if the method were workable in a mail survey, it 
would open up many more potential applications for the method. 

(3) Computer-aided self-interview (CASI) allows a respondent to provide answers directly to 
the computer, without letting them be seen by the interviewer. CASI has been used to 
reduce the effect of sensitive items by giving the respondent a greater sense of privacy. 
Might CASI have a similar effect with respect to items about immigration status? 

Special populations: non-English speaking (linguistically isolated), low literacy, high density 
of (undocumented) immigrants: Tests should be conducted to evaluate the performance of the 
items in populations with these characteristics, each of which was poorly or not at all represented 
in the GSS and might have an effect on ability or willingness to complete the item. 

Screening questions: The description of possible citizenship questions in the GAO report (page 
17-18) suggests the possibility of doing some further screening for citizenship to improve the 
precision of the estimates for the undocumented. To explain this concept, suppose that a 3-box 
item question is asked in which undocumented immigrant status appears in a box combined with 
citizens, and in the alternative card form the citizens appear alone. The estimate of the 
undocumented is obtained by subtracting the percentage in the latter box from the percentage in 
the former (based on two distinct halves of the split sample). If there were no other questions 
about citizenship, then the estimate would be subject to large variance because it would be based 
on the subtracting two large percentages, each subject to sampling variability, to obtain a small 
difference. At the other extreme, if there were another item or set of items on the survey that 
asked about citizenship, then all of the citizens could be identified directly and in the first card 
form, undocumented status could be deduced for each respondent. In that case the second form 
could be dispensed with, and the precision of estimates using the first form would be the same as 
with a direct question on status. (This configuration of items is described purely to illustrate a 
statistical principle. It must be emphasized that a questionnaire set up in this way would be 
contrary to the methodological and ethical principles underlying use of the GAM. It would be 
unethically deceptive since the implicit promise that undocumented status is not revealed for 
individuals would be violated. It would also be methodologically dubious since at least some 



Page 71 



GAO-06-775 Estimating the Undocumented Population 



Appendix III: A Review of Census Bureau and 
GAO Reports on the Field Test of the 
Grouped Answer Method 



respondents would likely sense the revealing nature of the combination of items.) The method 
used in the GSS excludes the native-born from answering the GAM item, thereby limiting the 
population for this item to the foreign-born. This represents a beneficial compromise between 
the two extreme options described above because it makes the "citizen" group smaller and 
therefore reduces error. Note that although this exclusion was used as a screener in the GSS 
(skipping out the native-born from the 3-box item) to shorten average survey length, this was not 
necessary statistically since the native-born could have been excluded afterwards. This suggests, 
however, that there might be other ways of asking additional immigration questions that would 
not fully identify the undocumented but would still assist in cutting down the number of 
respondents sharing a box with the undocumented. The concerns in doing this would be the 
ethical (confidentiality) concern and the possibility that including too many items on status 
would interfere with respondent cooperation, so any changes in this direction should be 
considered with the utmost caution to make sure that they are improvements on the current 
proposal of using a nativity question as a screener. 

Summary of questions for future field tests: To summarize points appearing above, the 
following issues should be addressed in future field tests: 

(a) Equivalent acceptability of all forms of the response card, 

(b) Usability with special populations including those with low literacy, the linguistically 
isolated, and concentrated immigrant populations, 

(c) Methods that avoid telephone interviews, or reduce bias and nonresponse due to use of the 
telephone, 

(d) Use of followup questions to improve the accuracy of box choices. 



Page 72 



GAO-06-775 Estimating the Undocumented Population 



Appendix IV: A Brief Examination of 
Responses Observed while Testing an 
Indirect Method for Obtaining Sensitive 
Information 



A Brief Examination of 
Responses Observed While 
Testing an Indirect Method for 
Obtaining Sensitive Information 



March 2, 2006 



Luke J. Larsen 
Immigration Statistics Staff 
U.S. Census Bureau 



Page 73 



GAO-06-775 Estimating the Undocumented Population 



Appendix IV: A Brief Examination of 
Responses Observed while Testing an Indirect 
Method for Obtaining Sensitive Information 



The Three-Card Method 

Developed by the U.S. Government Accountability Office (GAO) in the late 1990s, 
the three-card method (3 CM) is designed to obtain accurate estimates of the unauthorized 
foreign-born population in the United States while accomplishing the following tasks: 

• Reducing the psychological stress that stems from asking a question about 
such a sensitive topic as illegal immigration and 

• Eliminating the possibility that any one respondent could be identified as an 
illegal immigrant 

This is accomplished by drawing three random sub-samples from the foreign-born 
population and administering to each sub-sample a different variation of the migrant status 
question (each in the form of a card that is shown to respondents, hence the name "three- 
card method"). For this question, foreign-bom respondents are asked to indicate one of 
three migrant -status categories to which each of them belongs: 

• A specific status, such as "lawful permanent resident," 

• A collection of four other statuses, including "unauthorized migrant," or 

• A "catch-all" group for people whose statuses do not fit into the other two 
categories. 

For each question variant, the status in the first group is swapped with one of the statuses in 
the second group, so that each sub- sample has a different configuration of categories (in no 
instance is the unauthorized migrant status listed in the first group). When the data have 
been collected, the various migrant status estimates from all three sub-samples are 
combined to obtain an indirect estimate of undocumented migrants in the entire sample. 

In a 1998 "recommendations report," GAO requested that the U.S. Census Bureau 
conduct a test of the 3CM in a field environment. 1 To perform this test, the Census Bureau 
contracted with the National Opinion Research Center (NORC) of the University of 
Chicago to add a set of 3CM-oriented questions, including one designed to ask about 
migrant status, to their 2004 General Social Survey (GSS). 

About NORC and the GSS 

Established in 1941, NORC specializes in objective public opinion research in 
many areas of public policy interest, including health, labor, and education. Many survey 
projects administered by NORC provide a wealth of social indicators based on the attitudes 
and opinions of the public, while other studies focus on program evaluation, social 
experiments, needs assessments, and epidemiological case control designs. NORC has also 
proven itself to be a pioneer in the growing field of survey methodology, pushing forward 
improvements in data collection through electronic means and emphasizing the importance 
and utility of objective public opinion research. 



'The foreign-bom population includes anyone who was not a U.S. citizen or a U.S. national at birth. AJ! 
others - including those who were bom abroad or at sea of at least one parent who was a U.S. citizen - belong 
to the native population. 

! U.S. Government Accountability Office. Immigration Statistics: Information Gaps. Qua lily Issues Limit 
Utility of Federal Da ta t o Policymakers . (GAO/GGD-98- 1 64). Washington, D C : GAO, July 1998. 



Page 74 



GAO-06-775 Estimating the Undocumented Population 



Appendix IV: A Brief Examination of 
Responses Observed while Testing an Indirect 
Method for Obtaining Sensitive Information 



Prominent among survey products administered by NORC is the GSS, a biennial 
(since 1994, nearly annual from 1972-1993) survey that collects data about a number of 
demographic and attitudinal variables from a national area probability sample of adult 
respondents. In addition to the core demographic and attitudinal variables, the GSS also 
implements a series of special interest topical question modules on a rotational basis and, 
from time to time, experiments based on question wording, context effects, 
validity/re! iability assessments, and other methodological issues. Because of the wide 
scope of topical content and the focus on objective data collection, the GSS has become a 
popular and valuable resource for academic researchers, policy makers, and the mass media 
alike. 

Methodology 

The 3CM, as originally developed by GAO, did not conform to the survey design 
specifications of the GSS. Therefore, NORC was unable to administer three variations of 
the migrant status question to each of three separate samples. Instead, NORC used a 
modified version of the 3CM, wherein only one version of the migrant status question (in 
which Box A is for those who are lawful permanent residents) was administered within the 
entire GSS sample. Though this modification limited our ability to analyze the full 3CM 
and draw conclusions, we can use the 3CM data from the GSS to test how respondents 
react to the migrant status question and how well they understand the question format. 

NORC did not insert the 3CM questions directly into the core survey instrument, 
but instead appended them to the survey in the form of a question module. This module 
was not given to all respondents; rather, it was administered only to those who were born 
outside the United States (as determined by their responses to a question in the core 
instrument). Thus, while this filtering method was successful in exposing all foreign-born 
respondents to the 3CM question module, it also allowed bom-abroad U.S. natives to 
answer the module. However, the focus of this analysis is solely on the foreign born. 

The 3CM question module in the 2004 GSS consisted of three 3CM-designed 
questions to be administered to the respondent and two standard questions asked of the 
field representative (FR), The first two 3CM questions are primer questions that served to 
familiarize the respondent with the question format, the visual aids, and expected response 
behavior (specifically, indicating to which of the three groups the respondent belongs). 
The third question, which asks about the respondent's migrant status, is the focal point of 
the question module. When the respondent has completed these three questions, the FR 
was then asked to evaluate whether the respondent appeared to understand the 3 CM 
question format and whether the respondent objected or hesitated to answer the migrant 
status question. 



2 



Page 75 



GAO-06-775 Estimating the Undocumented Population 



Appendix IV: A Brief Examination of 
Responses Observed while Testing an Indirect 
Method for Obtaining Sensitive Information 



Analysis 

Demographic Characteristics 

The total respondent count - both native and foreign born - of the 2004 GSS was 
2,812 people; 3 of the total respondent pool, 237 people (8.4 percent) were foreign bom. J 
The distributions of the foreign-bom- in-sample and the total sample from the 2004 GSS are 
shown in Table 1 across six demographic variables: sex, age, Hispanic origin, marital 
status, educational attainment, and world region of birth, 5 Additionally, it would be 
worthwhile to know how these distributions compare to national estimates produced by the 
Census Bureau. We can obtain this information by using estimates provided by the 2004 
Annual Social and Economic Supplement (ASEC) to the Current Population Survey (CPS). 
For example, in 2004, the U.S. adult (aged 18 years and over) foreign-born population 6 of 
31.1 million people represented 14.5 percent of the tola! adult population according to the 
2004 ASEC, a share that is significantly larger than the 8.4 percent given by the GSS 
sample. 7 The distributions of the foreign-born population and the total population from the 
2004 ASEC across the same demographic variables are also shown in Table I, 

Comparing the GSS and ASEC distributions revealed some interesting information 
about the composition of the GSS sample." For example, the foreign-born and total 
distributions by age and the foreign-bom distributions by sex were not statistically different 
between the two data sources; however, the total GSS sample had a larger proportion of 
women than that represented by the ASEC estimates. Also, foreign-bom distributions of 
world region of birth showed that the GSS sample has less representation (relative to the 



"'This is the number of completed cases and does not include refusals, break-offs, and other forms of non- 
response. According to NORC. the 2004 GSS had a fion-resporise rate of 29.6 percent. For mote details, see 
Davis, lames Allan; Smith, Tom W.; and Marsden, Peter V. General Social Surveys. 1972-2004; Cumulative 
Codebook. Chicago: National Opinion Research Center. 2005. (National Data Program for the Social 
Sciences Series, no. 18), 

''The GSS does not have a variable that directly identifies respondents as being U.S. natives or foreign bom. 
For this review, the foreign born were designated as those who reported being bom outside the United States, 
were not born in Puerto Rico, and reported neither parent as being born in the United States. 

'The GSS data cited in this report are unweighted counts and should not be construed as population estimates. 

£ The population universe of the ASEC is limited to the civilian non-instilulionalized population in the United 
States, though some members of the armed forces may be included if they live with family members in off- 
post housing; for brevity, this universe will be denoted in this report as the total population. Likewise, the 
civilian non-institutionalized foreign-born population as measured by ASEC will simply be referred to as the 
foreign-bom population. 

1 All comparison tests presented in this report have taken sampling error into account and are significant at the 
90-percent confidence level, unless otherwise stated. 

"Comparisons by marital status, educational attainment, and Hispanic origin are not described in the text 
because the population universes for the GSS data and the publicly available ASEC data lack comparability. 
See Table I for further details. 



3 



Page 76 



GAO-06-775 Estimating the Undocumented Population 



Appendix IV: A Brief Examination of 
Responses Observed while Testing an Indirect 
Method for Obtaining Sensitive Information 



point estimates from the ASEC distributions) of those born in Latin America and more 
representation of those born in Europe. 5 

Responses to the Migrant Slants Question 

Among the 237 foreign-bom respondents in the GSS sample, 87 people (36,7 
percent) indicated belonging to Box A (lawful permanent resident), 128 people (54.0 
percent) indicated belonging to Box B (U.S. citizen, student/work/tourist visa, 
undocumented, or refugee/asylee), 1 person (0,4 percent) indicated belonging to Box C 
(other category not in Boxes A or B), 4 people (1.7 percent) gave a response other than Box 
A, B, or C, and 17 people (7.2 percent) were non-respondents who either refused to answer 
the question or gave a "don't know" response. That roughly 90 percent of foreign-bom 
respondents gave preferred responses (Boxes A, B, or C) is an indication that most foreign 
born who are asked about their migrant status in this format would understand the question, 
know the answer, and answer willingly. 

Field Representative Responses to the "Understand" and "Objection " Questions 

The field representatives reported that 190 of the foreign-born respondents {80.5 
percent) appeared to understand the 3CM question format, whereas 22 respondents (9.2 
percent) appeared not to understand the format. Also, the field representatives for another 
14 respondents (5.9 percent) gave an "other" response to this question, and 10 more field 
representatives (4,2 percent) were non-respondents (of which one field representative 
response was missing). It appears that there was some confusion among the field 
representatives in how to answer this question, since all responses should have been "yes" 
or "no." The crossed data between the migrant status question and the understanding 
question appears to support this statement; for example, of the 14 respondents whose field 
representatives assigned an "other" response to the understanding question, 12 gave 
preferred responses to the migrant status question. Depending on whether the "other," 
"refused," and "don't know" responses are assigned as "yes" or "no," the results indicate 
that between 10 and 20 percent of the respondents did not appear to understand the 3CM 
question format. 

The field representatives also reported that 2 16 of the foreign-born respondents 
(91.5 percent) did not raise an objection, hesitate, or remain silent when asked the migrant 
status question. Only 5 respondents (2. 1 percent) raised a verbal objection and 4 
respondents (1.7 percent) either hesitated to answer or remained silent. As with the 
"understanding" question, there appeared to be a slight issue with field representatives 
misunderstanding the "objection" question, as 2 respondents were assigned a response of 
"other" and 9 were designated as non-respondents (once again, one field representative 
response was missing). Interestingly, 3 respondents who objected to the migrant status 
question actually gave a preferred response, as did 3 respondents who hesitated to answer 
(obviously they did not remain silent). Also, 3 people who answered the question 
immediately gave an "other" response, and 3 more either refused to answer or replied with 
"don't know." However, the overwhelming majority of foreign-bom respondents gave a 



The representation of those burn in either Asia or Other Regions was not significantly different between the 
GSS sample and the ASEC estimates. "Other Regions" includes Northern America, Africa, and Oceania, 

4 



Page 77 



GAO-06-775 Estimating the Undocumented Population 



Appendix IV: A Brief Examination of 
Responses Observed while Testing an Indirect 
Method for Obtaining Sensitive Information 



preferred response (Boxes A, B, or C> to the migrant status question without objection, 
hesitation, or silence. 



5 



Page 78 



GAO-06-775 Estimating the Undocumented Population 



Appendix IV: A Brief Examination of 
Responses Observed while Testing an Indirect 
Method for Obtaining Sensitive Information 



Response Patterns to the Migrant Status Question by Characteristic 

Twenty-one foreign-bom respondents (8.9 percent) in the survey did not give a 
preferred answer to the migrant status question; that is, they either gave an "other" response 
(4 people, or 1.7 percent), a "don't know" response (11 people, or 4.7 percent), or a refusal 
to answer the question (6 people, or 2.5 percent). It is important to know whether these 
no ii- preferred responses to the 3CM-based migrant status question are more likely to occur 
for certain demographic cohorts among the foreign-bom population. Therefore, we 
examined the distribution of non-preferred responses to the migrant status question across 
dimensions of age, sex, Hispanic origin, marital status, educational attainment, and world 
region of birth. Keeping in mind that there arc not enough cases under consideration to 
establish that non- preferred responses are influenced by one or more characteristics, we can 
study these data for clues to patterns that might exist, had we a larger response pool with 
which to work. 

Of the six demographic variables being studied, only age and sex appeared to show 
disproportionate distributions of non- preferred responses. Specifically, the "don't know" 
responses were more prevalent among the older foreign born (aged 45 years and over; 7 
people) than the younger foreign bom (18 to 44 years old; 4 people), even though the 
younger group outnumbered the older group by a strong margin. Also, refusals were more 
prevalent among foreign-bom females (5 people) than males (1 person), even though the 
foreign- bom- in- sample were about equally distributed by sex. Outside of these two 
instances, the data suggested no relationship between each of the four remaining 
demographic variables and the patterns of non-preferred responses to the migrant status 
question. However, the small number of foreign-bom- in-sample - and the subsequently 
smaller number of respondents with non-preferred responses - makes it difficult to 
determine whether these trends are particularly pronounced. 

Respondent Comments Regarding the 3CM 

While administering the 3CM question module, field representatives were 
instructed to collect verbal comments from the respondents regarding each question and to 
submit their own comments for the two representative-directed questions. They were also 
instructed to enter respondents' answers when they did not conform to the 3CM format, 
thus comprising the category of responses known as "other," We shifted away from 
quantitative analysis to examine this qualitative data in an attempt to learn more about how 
respondents and field representatives perceive and respond to the 3CM questions. One 
piece of information gleaned from this analysis is that 25 respondents ( 10.5 percent) tended 
not to simply state to which migrant status group they belonged, but to state what their 
status was in both implicit ("been in country since age 6") and explicit ("I have a visa") 
terms. This number may actually be larger, since some field representatives might not have 
entered the respondents' comments. However, this raises the issue or how field 
representatives handled responses such as these. In some cases, when a respondent made 
such a comment, the field representative entered a response of "other," but in other cases, 
the response was set to one of the boxes. This pattern of inconsistent coding suggests that 
field representatives may have used their own judgment to set responses according to 
respondents" actual answers. 



6 



Page 79 



GAO-06-775 Estimating the Undocumented Population 



Appendix IV: A Brief Examination of 
Responses Observed while Testing an Indirect 
Method for Obtaining Sensitive Information 



Another useful piece of information is that the 3CM question format became 
problematic when attempts were made to administer the survey over a telephone. As 
previously stated, the GSS is conducted in a face-to-face environment in most eases, but in 
the event that a sampled person is not available when the field representative comes to the 
home, a follow-up attempt is made via telephone. However, since the 3CM is designed for 
use in a face-to-face setting, both respondents and field representative had trouble with the 
question module over the phone. This is evidenced in the comment Fields, wherein field 
representatives stated in two cases that they were unable to do the questions over the 
phone. Because we cannot assume that every field representative made a note regarding 
difficulty with administering the module over the phone, we don't know how many follow- 
up interviews this problem affected. 

Conclusion 

In compliance with the GAO recommendations, the U.S. Census Bureau was able to 
conduct a field test of the three-card method (via NORC and the GSS) and analyze the 
results. In summary, we found that nine out of ten foreign-bom respondents to the migrant 
status question gave format- appropriate answers (Box A, B, or C), eight out of ten appeared 
to understand the formal of the 3CM questions, and nine out of ten did not raise an 
objection, remain silent, or hesitate to answer when asked the migrant status question. 
Furthermore, the non-preferred responses to the migrant status question ("other," "don't 
know," or "refusal") did not appear to be strongly related with any of the six demographic 
variables under consideration. We also found a number of operational issues with the data, 
such as the tendency of some respondents to indicate their specific migrant status despite 
instructions not to do so, the inconsistent coding of proper responses among field 
representatives when given an answer other than a "box" response, and the difficulty in 
administering 3CM-designed questions in a situation other than a face-to-face environment. 



7 



Page 80 



GAO-06-775 Estimating the Undocumented Population 



Appendix IV: A Brief Examination of 
Responses Observed while Testing an Indirect 
Method for Obtaining Sensitive Information 



Table 1: Comparison of 2004 GSS Sample and 2004 CPS ASEC Estimates by Nativity 
and Selected Characteristics (in percent) 1 



f h ii rsctt ristics 


2004 GSS 


2004 CPS ASEC 1 


Foreign 
Born in 
Sample 


Total Sample 


Foreign- 
Bom 
Population 


Total 
Population 


Scs J 










Male 


48.9 


45.5 


50.3 


48.3 


Femnte 


51.1 


54.5 


49.7 


51.7 


Age 3 










1 8 to 44 years 


63.6 


50.2 


60.9 


51.5 


45 years and over 


36.4 


49.8 


39.1 


48.5 


Hispanic Origin ' 










Hispanic (of any race) 


27.4 


8.7 


45.2 


12.4 


Not Hispanic 


72.6 


91.3 


54.8 


87.6 


Marital Status 5 










Currently or previously married 


80.2 


78.0 


74.4 


71.0 


Never married 


19.8 


22.0 


25.6 


29.0 


Educational Attainment* 










At least high school diploma 


53.4 


87.0 


67.2 


85.6 


At least bachelor's degree 


39.8 


28.0 


27.: 3 


26.3 


World Region of Birth 37 










Europe 


23.8 


X 


13.9 


X 


Asia 


28.5 


X 


25.8 


X 


Latin America 


38.3 


X 


52.9 


X 


Other Regions s 


9.3 


X 


7.4 


X 


Africa 


7.2 


X 


NA 


X 


Australia 


0.4 


X 


NA 


X 


Canada 


1.7 


X 


NA 


X 



Sources: National Opinion Research Center (2004 GSS) and VS. Certs as Bureau (2004 CPS ASEC) 

"X" indicates "not applicable"; "NA" indicates **nol JiraiiabJc** i Some distribution. 1 ! may nnl add Lo 100,0% due la rounding, 

1 ■ The GSS data cited in this table arc based on unweighted courts and should not be construed as population estimates. 

1 The population universe of the CPS is restricted to [he civilian non- institutionalized papula lion in the United States, though some 

irRTnixTs or the Armed forces may be included if they live with family members in off- post bousing. For brevity, this report will refer 

In this population as the total population. 
1 - The AS EC-based foreign-born and total population estimate for age. Wn, and world region uf birth an: for the adult ( 1 8 years or 

older) population, in order to be more -comparable with the adult-only GSS sample. 
A - Hie AS EC-based total population estimates regarding Hispanic tjrigin are for the adult population, while the foreign-bom estimates 

regarding Hispanic origin arc for those aged 25 years or older. Since most of the Hispanic foreign bom were bom in Latin America. 

and because roost of the fore ign- born aged 18 to 24 years were bom in Latin America (66,0 percent, based on 2004 CPS ASEC data}. 

the share of Hispanic foreign-born adults in the U.S. would likely be isjore than the share of Hispanic foreign born aged 25 years or 

nlder. 

■ - The AS EC-based fureign-born and total population estimates regarding marital status are lor those who are aged 15 years or older. 
Since relatively few people under the age of 1 & lend in get married, the share of currently or prcvkiusly married people aged l& and 
older among the foreign- born and total populations would likely be greater than the foreign -bom and total population shares of 
currently or previously married people aged 15 and over, and the corresponding sever married shares would likely he lower, 

* - Tbc AS EC-based foreign-bom population estimates for educational attainment are based on those who arc aged 25 years or older, 
while the total population estimates are based on those who are aged 1 8 years or older. Since those aged IS to 24 years are Iras likely 
than older people in the total population to have attained eiiher at least a high school diploma (77,9 percent and 85,2 percent, 
respectively) or al least a bachelor's degree (8.4 percent and 27. 7 percent, respectively), the shares of adult foreign bom who attained 
at least a high school diploma or al least a bachelor's degree would likely be smaller than those shares shown for the foreign born aged 
25 years or older, assuming that educational attainment trends for the total population aged between I £ and 24 years can be transferred 
to trie foreign- bum population of the same age group. 

T - Because the focus of this report is upon the foreign-bom population, we chose lo examine the world regions of birth only for the 
foreign bom. 

E - 'Other Regions" includes North era America, Africa, and Oceania, 



8 



Page 81 



GAO-06-775 Estimating the Undocumented Population 



Appendix V: The Issue of Informed Consent 



Appropriately informing each respondent about what information he or 
she is being asked to provide is a key issue. On one hand, the grouped 
answers approach logically conveys to each respondent exactly what he or 
she is being asked to reveal about himself or herself; no one we spoke with 
suggested otherwise. On the other hand, the grouped answers question 
series does not indicate that the respondent is being asked to participate 
in an effort that will result in estimates of all immigration statuses. 
Therefore, a statement is needed to convey this information. 

Officials and staff at the National Center for Health Statistics (NCHS) were 
particularly concerned about this issue and believed that failing to 
adequately address informed consent issues could be considered 
unethical. 1 

Privacy protection specialists at the Census Bureau said that 

• An introductory statement before the first immigration-related 
question might be phrased, "The next questions are geared to 
helping us know more about immigration and the role that it plays in 
American life." 

• When each respondent is shown the 3-box training cards, it would 
be possible to explain to him or her that — while the survey does not 
ask, and does not want to know, the specifics of which Box B 
category applies to him or her — there will be other interviews in 
which other respondents will be asked about some of the Box B 
categories or statuses. 2 

• Just before showing each respondent the immigration status card, it 
should be stated — and, in fact, interviewers stated in the test with 
Hispanic farmworkers — that "Using the boxes allows us to obtain 
the information we need, without asking you to give us information 
that you might not want to." Further: "Because we're using the 
boxes, we WON'T 'zero in' on anything somebody might not want to 
tell us." 3 



x None of the immigration experts we interviewed raised this issue, however. 

2 Thus far, testing has included only one immigration status card, so test interviewers have 
not told respondents that other respondents will be providing information on some of the 
Box B statuses. 

3 See GAO/GGD-00-30. 



Page 82 



GAO-06-775 Estimating the Undocumented Population 



Appendix V: The Issue of Informed Consent 



• It may also be possible to explain that the study's goal is to allow 
researchers to broadly estimate all categories or statuses on the card 
for the population of immigrants — but to indicate that this will be 
done without ever asking questions that "zero in" on something that 
some respondents might not want to disclose in an interview. 

• Neither the estimation method (that is, the two cards) nor the 
specific policy relevance of immigration-status estimates would have 
to be described to all respondents. However, interviewer statements 
should be provided for responding to respondents who have doubts 
or questions. 



Page 83 



GAO-06-775 Estimating the Undocumented Population 



Appendix VI: A Note on Variances and 
"Mirror Image" Estimates 



The statistical expression and variance of a grouped answers estimate is 
as follows, with the starting point being the percentage or proportion of 
subsample 1 who are in Box B, Card 1, and the procedure being to 
subtract from this the proportion of subsample 2 who are in Box A, Card 2 
(with cards and boxes as defined as in figure 3): 1 

Grouped answers estimate = p x - p 2 . 
where 

Pj = the proportion of subsample 1 in Box .8, Card 1 
p 2 = the proportion of subsample 2 in Box A, Card 2 

Variance (j^-pj = [foq/n,) + (p 2 q/n 2 )] 
where 

q t = 1 - p t = the proportion of subsample 1 not in Box B, Card 1 
q 2 = 1 - p 2 = the proportion of subsample 2 not in Box A, Card 2 
nj and n 2 = numbers of respondents in subsamples 1 and 2, 
respectively. 

The immigration status cards in figure 3 are designed so that Boxes A and 
B include all major immigration statuses. This design ensures that, on each 
card, the Box B categories apply to the largest possible number of legally 
present respondents. In designing the cards this way, we reasoned that 
this should reduce the question threat associated with choosing Box B. 

As a result, few respondents are expected to choose Box C ("some other 
category not in Box A or Box B"). For example, in the 2004 GSS test, only 
one foreign-born respondent of more than 200 chose Box C. Therefore, we 
believe that for purposes of illustrative variance calculations, it is 
reasonable to assume that no one chooses Box C. Under this assumption, 
the two mirror-image estimates of the percentage of the foreign-born who 
are undocumented would necessarily be exactly the same, as explained 
below. 

Assuming that no respondent chooses Box C, then 

q : = 1 - p : = the proportion of subsample 1 in Box A, Card 1 
q 2 = 1 - p 2 = the proportion of subsample 2 in Box B, Card 2 



x For simplicity, the discussion in this appendix assumes simple random sampling, for both 
the main sample and the selection of the two subsamples. 



Page 84 



GAO-06-775 Estimating the Undocumented Population 



Appendix VI: A Note on Variances and "Mirror 
Image" Estimates 



The alternative, mirror-image estimate can then be defined as follows: 
Mirror-image estimate = q 2 - q x 

As indicated above, q t and q 2 are defined in terms of p x and p 2 . Using 
algebraic substitution, we have: 

Pi - P 2 = C 1 - Qi) - C 1 - <k) = 1 - 1 - Qi + Qa = Qa- Qi 

In other words, under the assumption that no one chooses Box C, the 
mirror-image estimates of the percentage undocumented are, by definition, 
identical. Thus, no precision gain follows from combining them. 2 No 
additional information is provided by a second, mirror-image estimate. 

In contrast, quantitative indirect estimates are based on a combination of 
(1) grouped answers data and (2) additional, separate quantitative data or 
estimates (for example, per-person estimates of emergency-visit costs 
based on respondent reports of number of emergency room visits in the 
past year and other information from hospitals on per-visit costs). If the 
quantitative data are tallied or totaled for individuals in each box of each 
card, the result is four different figures, none of which can be derived from 
the others. (There are different respondents in each box, and each would 
have separately reported how many emergency room visits, for example, 
he or she made in the past year.) Thus, for quantitative estimates of this 
type, calculating two independent mirror-image estimates, and averaging 
them, may yield a more precise result. 



2 Logically, if very few persons choose Box C, the precision gains from combining the 
mirror-image estimates (which would necessarily be very similar to each other) would be 
very small. 



Page 85 



GAO-06-775 Estimating the Undocumented Population 



Appendix VII: Comments from the 
Department of Commerce 




THE DEPUTY SECRETARY OF COMMERCE 

Washington, D.C. 20230 



September 19, 2006 



Ms. Judith A. Droitcour 

Assistant Director 

Applied Research and Methods 

United States Government Accountability Office 

Washington, DC 20548-0001 

Dear Ms. Droitcour: 

The U.S. Department of Commerce appreciates the opportunity to comment on the 
United States Government Accountability Office's draft report entitled Estimating the 
Undocumented Population: A "Grouped Answers "Approach to Surveying Foreign-Born 
Respondents (GAO-06-775). I enclose the Department's comments on this report. 



Enclosure 




Page 86 



GAO-06-775 Estimating the Undocumented Population 



Appendix VII: Comments from the 
Department of Commerce 



U.S. Department of Commerce 
Comments on the 
United States Government Accountability Office 
Draft Report Entitled Estimating the Undocumented Population: A "Grouped Answers " 
Approach to Surveying Foreign-Bom Respondents (GAO-06-775) 
September 2006 



The U.S. Census Bureau generally agrees with the observations in this report but has some 
comments and clarifications about various statements. 

Regarding footnote 1 on page 1 : 

GAO Report: "Our previous reports and those of other government agencies have sometimes 
used the terms undocumented, illegal aliens, illegal immigrants, unauthorized immigrants, and 
not legally present. We use undocumented here, because this report concerns a technique for 
surveying the foreign-born, an ongoing federal survey uses this term as a response category when 
asking about legal status, and foreign-born respondents appear to understand the term. We 
define undocumented as foreign-born persons who are illegally present in the United States. 
Foreign-bom persons (i.e., those not born a U.S. citizen) were born outside the United States to 
parents who were both not U.S. citizens at the time of the birth." 

Census Bureau Response: Although the Census Bureau has used the term "undocumented," we 
generally prefer the term "unauthorized" rather than "undocumented." When legal statuses 
associated with the "unauthorized" category are not separately estimable or are demographically 
not meaningful, we use the term "residual" to describe this group. 

Regarding footnote 2 on page 1 : 

GAO Report: "Most recently, the Census Bureau has stated that among its "enhancement 
priorities" to "improve estimates of net international migration" are efforts to estimate 
"international migrants by migrant status (legal migrants, temporary migrants, quasi-legal 
migrants, unauthorized migrants, and emigrants)" with the overall purpose being to produce 
annual estimates of the U.S. population. ("The U.S. Census Bureau's Intercensal Population 
Estimates and Projections Program: Basic Underlying Principles," paper distributed by the 
Bureau of the Census at its conference on "Population Estimates: Meeting User Needs," 
Embassy Suites, Alexandria, Virginia, July 19, 2006.)" 

Census Bureau Response: The Census Bureau is researching methods of estimating the size of 
the foreign-born population by legal status. 



Page 87 



GAO-06-775 Estimating the Undocumented Population 



Appendix VII: Comments from the 
Department of Commerce 



Regarding footnote 51 on page 29: 

GAO Report: "We note that these two examples involve agencies that are apparently viewed 
neutrally by the immigrant community. Agencies that are negatively viewed by at least some are 
the Department of Homeland Security (DHS) and Census." 

Census Bureau Response: We are not aware of empirical evidence that the Census Bureau is 
viewed negatively by any specific groups. 

Our specific comments about the report are as follows: 

Pages 6 to 15: The description of the "grouped response" method is accurate, including the 
discussion of strengths and limitations. 

Pages 21 to 26 and pages 64 to 68: The discussion of the Census Bureau-sponsored General 
Social Survey evaluation, including its strengths and limitations, and Dr. Zaslavsky's evaluation 
are accurately described. 

Pages 35 to 38: The Census Bureau agrees that a "validity study" is a good idea. The "validity 
study" of the grouped response methods would need to be performed to determine if the 
"grouped response" method can be used and will generate accurate estimates. 



Page 88 



GAO-06-775 Estimating the Undocumented Population 



Appendix VIII: Comments from the 
Department of Homeland Security 



U.S. Department of Homeland Security 

Washington, DC 20528 




Homeland 
Security 



September 12, 2006 



Ms. Nancy R. Kingsbury 
Managing Director 
Applied Research and Methods 
US General Accountability Office 
Washington, DC 20548 

Re: Draft Report GAO-06-775 "Estimating the Undocumented Population: A "Grouped 
Answers" Approach to Surveying Foreign-Born Respondents." 

Thank you for the opportunity to review the draft report. GAO demonstrates that the "grouped 
answers" approach to surveying foreign-born respondents has the potential to capture information 
on unauthorized aliens in the United States that is not available using existing methods and 
sources. They also serve notice that there are significant hurdles to implementing the approach. 
The Office of Immigration Statistics (OIS) believes that information on immigration status and 
the characteristics of those immigrants potentially available through this method would be useful 
for evaluating immigration programs and policies (e.g., characteristics of unauthorized aliens, 
program benefit use, and method of entry). We therefore recommend that GAO pilot the 
methodology in a limited geographic area in order to determine whether the information can be 
collected reliably, and to better estimate costs of a national survey. Our more specific comments 
to the report are listed below. 

If a new survey needs to be developed then it should be designed to cover all foreign-bom 
persons in the country no matter their time in the United States. The current, national surveys are 
limited to those who have lived here at least 2 months and likely exclude some unauthorized and 
temporary migrants. 

The GAO report (page 53) suggests that the reliability of lawfully admitted immigrant's 
responses could be tested by making comparisons with publicly available administrative 
information. The comparisons may not be made as directly as implied because administrative 
data on immigrant flows will have to be adjusted for estimated changes in population, such as 
through emigration and mortality. 



Sincerely, 



Steven J. Pecmovsky 
Director 

Departmental GAO/OIG Liaison Office 




www.dhs.gov 



Page 89 



GAO-06-775 Estimating the Undocumented Population 



Appendix IX: Comments from the 
Department of Health and Human Services 




DEPARTMENT OF HEALTH & HUMAN SERVICES 



Office of the Assistant Secretary 
for Legislation 



Washington, D.C. 20201 



SEP 1 2 2006 



Nancy R. Kingsbury 

Managing Director, Applied Research and Methods 
U.S. Government Accountability Office 
Washington, DC 20548 

Dear Ms. Kingsbury: 

Enclosed are the Department's comments on the U.S. Government Accountability 
Office's (GAO) draft report entitled, "Estimating the Undocumented Population: A 
Grouped Answers Approach to Surveying Foreign-Born Respondents" (GAO-06-775), 
before its publication. 

These comments represent the tentative position of the Department of Health and Human 
Services and are subject to reevaluation when the final version of this report is received. 

The Department provided several technical comments directly to your staff. 

The Department appreciates the opportunity to comment on this draft report before its 
publication. 



Sincerely, 




Vincent J. Ventimiglia, Jr. 
Assistant Secretary for Legislation 



Page 90 



GAO-06-775 Estimating the Undocumented Population 



Appendix IX: Comments from the Department 
of Health and Human Services 



COMMENTS FROM THE DEPARTMENT OF HEALTH AND HUMAN 
SERVICES ON ESTIMATING THE UNDOCUMENTED POPULATION: 
A GROUP ANSWERS APPROACH TO SURVEYING FOREIGN -BORN 
RESPONDENTS GAO-06-775 

HHS Comments 

GAO is correct in their assessment that the National Survey on Drug Use and Health 
(NSDUH) is NOT appropriate for collecting data on immigration status. NSDUH has a 
large number of sensitive questions on the use of illicit drugs that may cause persons with 
undocumented status to not select the correct box in the "grouped answers" section out of 
fear of somehow being identified. Also, the fact that NSDUH is sponsored by a 
government agency may not be acceptable to foreign-bom respondents. The report 
indicated that this population may feel more comfortable responding to a study sponsored 
by a university or private sector organization. 

The procedure used to estimate the size of the undocumented population is provided on 
page 12; however, it does not indicate that the "mirror-image" estimate could be used in 
combination with the other estimate in an attempt to reduce variance. If there is some 
variance reduction, this could mean that a smaller sample size is needed; thus reducing 
costs. 

Add an appendix where formulas are presented on the estimation of the undocumented 
population along with its variance. Include the combination of the "mirror-image" 
estimate and its variance. How does the variance of the "grouped answers" estimate 
compare to an estimate based on a question asked directly? Even though asking a direct 
question is not feasible, we can get a perspective on how different the "grouped answers" 
variance is from a the variance from a more traditional estimator. 

Disclosure of use of data: The respondents are shown three boxes. Each one lists several 
possible immigration statuses, including United States citizen and legal permanent 
resident, as well as undocumented resident (See pages 8-9). The undocumented status 
always appears in Box B along with other responses. The respondents are asked to 
choose the box that contains their immigration status. If they choose the one with the 
undocumented status which is always Box B, they are told, "If the specific category that 
applies to you is in Box B, we do not want to know which one it is because we are 
focusing on Box A categories." While it's true that the interviewers do not want to know 
the specific immigration status for any specific respondent, it is not true that they are 
focusing on Box A categories. In fact, the entire purpose of the exercise is to estimate 
how many people are undocumented by extrapolating from the number that choose Box 
B. 



2 



Page 91 



GAO-06-775 Estimating the Undocumented Population 



Appendix X: GAO Contact and Staff 
Acknowledgments 



GAO Contact Nancy R. Kingsbury, (202) 512-2700 or kingsburyn@gao.gov. 



Staff s t & ff contributing to this report include Judith A. Droitcour, 

Eric M. Larson, and Penny Pickett. Statistical support was provided by 
Acknowledgments Sid Schwartz, Mark Ramage, and Anna Maria Ortiz. 



Page 92 



GAO-06-775 Estimating the Undocumented Population 



Bibliography 



Bird, Ronald. Statement of Ronald Bird, Chief Economist, Office of the 
Assistant Secretary for Policy, U.S. Department of Labor, before the 
Committee on the Judiciary, U.S. Senate, July 5, 2006. 

Boruch, Robert, and Joe S. Cecil. Assuring the Confidentiality of Social 
Research Data. Philadelphia: University of Pennsylvania Press, 1979. 

Camarota, Steven A., and Jeffrey Capizzano. "Assessing the Quality of Data 
Collected on the Foreign Born: An Evaluation of the American Community 
Survey (ACS): Pilot and Full Study Findings," Immigration Studies White 
Papers, Sabre Systems Inc., April 2004. 

http://www.sabresys.com/whitepapers/CIS_whitepaper.pdf (Sept. 6, 2006). 

Costanzo, Joseph, and others, "Evaluating Components of International 
Migration: The Residual Foreign-Born," Population Division Working 
Paper 61, U.S. Census Bureau, Washington, D.C., June 2002, p. 22. 

Droitcour, Judith A., and Eric M. Larson, "An Innovative Technique for 
Asking Sensitive Questions: The Three-Card Method," Bulletin de 
Methodologie Sociologique, 75 (July 2002): 5-23. 

El-Badry, Samia, and David A. Swanson, "Providing Special Census 
Tabulations to Government Security Agencies in the United States: The 
Case of Arab-Americans," paper presented at the 25th International 
Population Conference of the International Union for the Scientific Study 
of Population, Tours, France, July 18-23, 2005. 

Hill, Kenneth. "Estimates of Legal and Unauthorized Foreign-Born 
Population for the United States and Selected States Based on Census 
2000." Presentation at the U.S. Census Bureau Conference, Immigration 
Statistics: Methodology and Data Quality, Alexandria, Virginia, February 
13-14,2006. 

Hoefer, Michael, Nancy Rytina, and Christopher Campbell. Estimates of 
the Unauthorized Immigrant Population Residing in the United States: 
January 2005. Washington, D.C.: Department of Homeland Security, 
Office of Immigration Statistics, August 2006. 

GAO. Undocumented Aliens: Questions Persist about Their Impact on 
Hospitals' Uncompensated Care Costs, GAO-04-472. Washington, D.C.: 
May 21, 2004. 



Page 93 



GAO-06-775 Estimating the Undocumented Population 



Bibliography 



GAO. Illegal Alien Schoolchildren: Issues in Estimating State-by -State 
Costs, GAO-04-733. Washington, D.C.: June 23, 2004. 

GAO. Overstay Tracking: A Key Component of Homeland Security and a 
Layered Defense, GAO-04-82. Washington, D.C.: May 21, 2004. 

GAO. Record Linkage and Privacy: Issues in Creating New Federal 
Research and Statistical Information. GAO-01-126SP. Washington, D.C.: 
April 2001. 

GAO. Survey Methodology: An Innovative Technique for Estimating 
Sensitive Survey Items, GAO/GGD-00-30. Washington, D.C.: November 
1999. 

GAO. Immigration Statistics: Information Gaps, Quality Issues Limit 
Utility of Federal Data to Policymakers, GAO/GGD-98-164. Washington, 
D.C.: July 31, 1998. 

Greenberg, Bernard G., and others. "The Unrelated Questions Randomized 
Response Model: Theoretical Framework." Journal of the American 
Statistical Association, 64 (1969): 520-39. 

Kincannon, Charles Louis, "Procedures for Providing Assistance to 
Requestors for Special Data Products Known as Special Tabulations and 
Extracts," memorandum to Associate Directors, Division Chiefs, Bureau of 
the Census, Washington, D.C., August 26, 2004. 

Locander, William, and others. "An Investigation of Interview Method, 
Threat, and Response Distortion." Journal of the American Statistical 
Association, 71 (1976): 269-75. 

National Research Council, Committee on National Statistics, Local Fiscal 
Effects of Illegal Immigration: Report of a Workshop. Washington, D.C.: 
National Academy Press, 1996. 

Passel, Jeffrey S. "The Size and Characteristics of the Unauthorized 
Migrant Population in the U.S.: Estimates Based on the March 2005 
Current Population Survey." Research Report. Washington, D.C.: 
Pew Hispanic Center, March 7, 2006. 



Page 94 



GAO-06-775 Estimating the Undocumented Population 



Bibliography 



Passel, Jeffrey S., Rebecca L. Clark, and Michael Fix. "Naturalization and 
Other Current Issues in U.S. Immigration: Intersections of Data and 
Policy," In Proceedings of the Social Statistics Section of the American 
Statistical Association: 1997. Alexandria, Va.: American Statistical 
Association, 1997. 

Robinson, J. Gregory. "Memorandum for Donna Kostanich." DSSD A.C.E. 
Revision II Memorandum Series No. PP-36. Washington, D.C.: U.S. Bureau 
of the Census, December 31, 2002. 

Rytina, Nancy F. Estimates of the Legal Permanent Resident Population 
and Population Eligible to Naturalize in 2004. Washington, D.C.: 
Department of Homeland Security, Office of Immigration Statistics, 
February 2006. 

Schryock, Henry S., and Jacob S. Siegel and Associates. The Methods and 
Materials of Demography. Washington, D.C.: U.S. Government Printing 
Office, 1980. 

Siegel, Jacob S., and David A. Swanson. The Methods and Materials of 
Demography, 2nd ed. San Diego, Calif: Elsevier Academic Press, 2004. 

U.S. Census Bureau, "The U.S. Census Bureau's Intercensal Population 
Estimates and Projections Program: Basic Underlying Principles," paper 
distributed by the Census Bureau at its conference on Population 
Estimates: Meeting User Needs, Alexandria, Virginia, July 19, 2006. 

U.S. Commission on Immigration Reform. U.S. Immigration Policy: 
Restoring Credibility: 1994 Report to Congress. Washington, D.C.: 
U.S. Government Printing Office, 1994. 

U.S. Immigration and Naturalization Service, Office of Policy and Planning. 

Estimates of the Unauthorized Immigrant Population Residing in the 
United States: 1990 to 2000. Washington, D.C.: January 2003. 

U.S. Department of Labor, Findings from the National Agricultural 
Workers Survey (NAWS) 2000-2002: A Demographic and Employment 
Profile of United States Farm Workers. Research Report 9. Washington, 
D.C.: March 2005. 

Warner, Stanley. "Randomized Response: A Survey Technique for 
Eliminating Evasive Answer Bias." Journal of the American Statistical 
Association, 60 (1995): 63-69. 



Page 95 



GAO-06-775 Estimating the Undocumented Population 



Bibliography 



Warren, Robert, and Jeffrey S. Passel. "A Count of the Uncountable: 
Estimates of Undocumented Aliens Counted in the 1980 Census." 
Demography, 24:3 (1987): 375-93. 



(460577) 



Page 96 



GAO-06-775 Estimating the Undocumented Population 



GAO's Mission ^ e Government Accountability Office, the audit, evaluation and 

investigative arm of Congress, exists to support Congress in meeting its 
constitutional responsibilities and to help improve the performance and 
accountability of the federal government for the American people. GAO 
examines the use of public funds; evaluates federal programs and policies; 
and provides analyses, recommendations, and other assistance to help 
Congress make informed oversight, policy, and funding decisions. GAO's 
commitment to good government is reflected in its core values of 
accountability, integrity, and reliability. 



Obt&inin^ CODiGS Of ^ e ^ as * es * anc * eas i es * wa y t° obtain copies of GAO documents at no cost 

o is through GAO's Web site (www.gao.gov). Each weekday, GAO posts 

GAO Reports and newly released reports, testimony, and correspondence on its Web site. To 

rp nn , 7 have GAO e-mail you a list of newly posted products every afternoon, go 

leSimiOIiy to ^y^^^ an( j se iect "Subscribe to Updates." 



Order by Mail or Phone The first copy of each printed report is free. Additional copies are $2 each. 

A check or money order should be made out to the Superintendent of 
Documents. GAO also accepts VISA and Mastercard. Orders for 100 or 
more copies mailed to a single address are discounted 25 percent. Orders 
should be sent to: 

U.S. Government Accountability Office 
441 G Street NW, Room LM 
Washington, D.C. 20548 

To order by Phone: Voice: (202)512-6000 
TDD: (202) 512-2537 
Fax: (202) 512-6061 



To Report Fraud, Contact: 

Waste and Abuse in ^ eD s ^ e: www g a ° g° v /f raucme t/f rau dnet.htm 

' E-mail: fraudnet@gao.gov 

Federal Programs Automated answering system: (800) 424-5454 or (202) 512-7470 



Pnn 0rP<sminrml Gloria Jarmon, Managing Director, JarmonG@gao.gov (202) 512-4400 

KjUL Lgl t^&lUl Ldl n g Govemment Accountability Office, 441 G Street NW, Room 7125 

Relations Washington, D.C. 20548 



PllbllP Affairs Paul Anderson, Managing Director, AndersonPl@gao.gov (202) 512-4800 

UU C ^J- 1 ^ 11 o U.S. Government Accountability Office, 441 G Street NW, Room 7149 

Washington, D.C. 20548 



PRINTED ON^^ RECYCLED PAPER 



